You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "eric baldeschwieler (JIRA)" <ji...@apache.org> on 2007/09/28 23:40:50 UTC

[jira] Created: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Wildcard input syntax (glob) should support {}
----------------------------------------------

                 Key: HADOOP-1968
                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
             Project: Hadoop
          Issue Type: Improvement
            Reporter: eric baldeschwieler


We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:

'/data/2007{0830,0831,0901}/typeX/'

To input 3 days data into map-reduce (or Pig in this case). 

(Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532547 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-1968:
------------------------------------------------

+1
Codes looks good.  Below are some thoughts.

- Since this is a single thread situation, StringBuilder is more efficient than StringBuffer.

- For this problem, using some parser generators (e.g. yacc) might be better than Java Regex.


> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1968:
----------------------------------

    Attachment: curlyGlob1.patch

The patch uses StringBuilder in stead of StringBuffer. I feel that the use of lex & Yacc is a too big project now. So the new patch does not incorporate this suggestion.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532949 ] 

eric baldeschwieler commented on HADOOP-1968:
---------------------------------------------

Why is glob going through path?  It seems to me that a glob string is not a path string and shouldn't be processed as such.  The resulting match is a list of paths.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533173 ] 

Hairong Kuang commented on HADOOP-1968:
---------------------------------------

It is for spliting a path name into path components. Matching is done component by component.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milind Bhandarkar updated HADOOP-1968:
--------------------------------------

        Fix Version/s: 0.15.0
    Affects Version/s: 0.14.1

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>             Fix For: 0.15.0
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532562 ] 

Hadoop QA commented on HADOOP-1968:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12367124/curlyGlob1.patch
against trunk revision r582033.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/890/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/890/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/890/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/890/console

This message is automatically generated.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1968:
----------------------------------

    Status: Patch Available  (was: Open)

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1968:
----------------------------------

    Component/s: fs

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1968:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Hairong!

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532851 ] 

Hudson commented on HADOOP-1968:
--------------------------------

Integrated in Hadoop-Nightly #263 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/263/])

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch, curlyGlob1.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1968:
----------------------------------

    Attachment: curlyGlob.patch

This patch allows a glob to use curly brackets as descripbed in the jira. It also makes sure that a file name that contains Java Regex special characters does not get interpreated as an instruction.

There is one problem left with globs which is that glob escape does not work. See HADOOP-1995 for more details. I will fix the escape problem once HADOOP-1995 is resolved.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>         Attachments: curlyGlob.patch
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531995 ] 

eric baldeschwieler commented on HADOOP-1968:
---------------------------------------------

It is not a blocker, but it would resolve some user issues we'd really like to fix.  If we can get it into 15, it would make some people happy.  But I would not hold the release for this feature.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>             Fix For: 0.15.0
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531895 ] 

dhruba borthakur commented on HADOOP-1968:
------------------------------------------

Is this a blocker for 0.15 release?

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>             Fix For: 0.15.0
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang reassigned HADOOP-1968:
-------------------------------------

    Assignee: Hairong Kuang

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>            Assignee: Hairong Kuang
>             Fix For: 0.15.0
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1968) Wildcard input syntax (glob) should support {}

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531940 ] 

Doug Cutting commented on HADOOP-1968:
--------------------------------------

> Is this a blocker for 0.15 release?

I don't think so.  We don't usually do new feature blockers, rather only regression bugs.

> Wildcard input syntax (glob) should support {}
> ----------------------------------------------
>
>                 Key: HADOOP-1968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1968
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.14.1
>            Reporter: eric baldeschwieler
>             Fix For: 0.15.0
>
>
> We have users who have organized data by day and would like to select several days in a single input specification.  For example they would like to be able to say:
> '/data/2007{0830,0831,0901}/typeX/'
> To input 3 days data into map-reduce (or Pig in this case). 
> (Also the use of regexp to resolve glob paterns looks like it might introduce some other bugs.  I'd appreciate it if someone took another look at the code to see if there are any file characters that could
> be interpreted as regexp "instructions").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.