You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "John Vines (Created) (JIRA)" <ji...@apache.org> on 2011/11/18 20:28:51 UTC

[jira] [Created] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Add support for wildcards/regexes in locality group setting.
------------------------------------------------------------

                 Key: ACCUMULO-164
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
             Project: Accumulo
          Issue Type: Improvement
            Reporter: John Vines


We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "Billie Rinaldi (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226890#comment-13226890 ] 

Billie Rinaldi edited comment on ACCUMULO-164 at 3/10/12 4:32 PM:
------------------------------------------------------------------

This isn't a suffix wildcard for a search, it's a wildcard for how the data is grouped in an RFile.  It might be a good exercise to try to find a use case for it, though.  I started thinking about a case where you have two pieces of information in the family, e.g. "snake" + "reptile", where "\*reptile" would be the suffix used to create the locality group.  However, I can't think of a good reason not to use "reptile" + "snake" with a prefix wildcard of "reptile\*" instead.
                
      was (Author: billie.rinaldi):
    This isn't a suffix wildcard for a search, it's a wildcard for how the data is grouped in an RFile.  It might be a good exercise to try to find a use case for it, though.  I started thinking about a case where you have two pieces of information in the family, e.g. "snake" + "reptile", where "*reptile" would be the suffix used to create the locality group.  However, I can't think of a good reason not to use "reptile" + "snake" with a prefix wildcard of "reptile*" instead.
                  
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "Billie Rinaldi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226890#comment-13226890 ] 

Billie Rinaldi commented on ACCUMULO-164:
-----------------------------------------

This isn't a suffix wildcard for a search, it's a wildcard for how the data is grouped in an RFile.  It might be a good exercise to try to find a use case for it, though.  I started thinking about a case where you have two pieces of information in the family, e.g. "snake" + "reptile", where "*reptile" would be the suffix used to create the locality group.  However, I can't think of a good reason not to use "reptile" + "snake" with a prefix wildcard of "reptile*" instead.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227635#comment-13227635 ] 

Keith Turner commented on ACCUMULO-164:
---------------------------------------

I at this point I am not advocating for a suffix wildcard.  I am just trying to understand what it automatically verifiable as disjoint.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "David Medinets (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226712#comment-13226712 ] 

David Medinets commented on ACCUMULO-164:
-----------------------------------------

I thought it was more efficient to store strings reversed if suffix wildcards are supported?
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "John Vines (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Vines updated ACCUMULO-164:
--------------------------------

    Component/s: tserver
                 master
                 client
    
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226654#comment-13226654 ] 

Keith Turner commented on ACCUMULO-164:
---------------------------------------

John made the comment offline that determining if a set of patterns matches disjoint sets of column families may not be possible.  I think this is may be true for regular expressions.   However, it may be easy to determine this automatically with limited wildcarding.   

If only prefix wildcards were allowed, it seems like the following algorithm would ensure they are disjoint.

{noformat}
  boolean isDisjoint(Set<String> prefixes){
     while(prefixes.size() > 1){
       String shortestPrefix = removeShortestString(prefixes);
       for(String prefix : prefixes){
         if(prefix.startsWith(shortestPrefix)){
           return false;
         }
       }
     }
     return true;
  }
{noformat}

Does this seem correct? For suffixes, startsWith() would be replaced with endsWith().  So maybe we can handle all prefix wildcards or all suffix wildcards.  Can we verify anything else is disjoint?  I do not think so.

The following wildcards could match overlapping sets.

{noformat}
  *a*
  *b*
{noformat}

And so could the following.

{noformat}
  foo*
  *bar
{noformat}

So even though the literal parts of the above wildcards are unique, they can still match overlapping data. 
 


                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225540#comment-13225540 ] 

Keith Turner commented on ACCUMULO-164:
---------------------------------------

The pattern thats used to create a locality group would need to be kept in the RFile.  This would be used to determine if the locality groups contains any columns of interest at seek time.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227645#comment-13227645 ] 

Keith Turner commented on ACCUMULO-164:
---------------------------------------

A prefix wildcard is a simple form of a range.  One other thing we may want to consider is simply having ranges.  So the candidates for this ticket would be prefix wildcards, suffix wildcards, and/or ranges.  These can be all be automatically verified as disjoint.

Is allowing this type of setting in locality group configuration worthwhile w/o changing the scanner API?  If we decided to allow prefix wildcards, then it would be easy to create a locality group with millions of actual column families.  However there is no way on the client side to ask for everything in that locality group without enumerating all possible column families.  It may not be possible to enumerate all possible columns families, therefore it may not be possible to read an entire locality group.  To remedy this the client API and iterator API would need to be changed to allow specification of column prefixes (or ranges, or column suffixes).



                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>
> We should look into adding the ability to specify locality group columns as either wildcarding or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira