You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2012/06/16 07:30:42 UTC

[jira] [Created] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

stack created HBASE-6222:
----------------------------

             Summary: Add per-KeyValue Security, get federal funding for HBase?
                 Key: HBASE-6222
                 URL: https://issues.apache.org/jira/browse/HBASE-6222
             Project: HBase
          Issue Type: New Feature
          Components: security
            Reporter: stack


Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14

"The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."

Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396972#comment-13396972 ] 

Andrew Purtell edited comment on HBASE-6222 at 6/19/12 6:39 PM:
----------------------------------------------------------------

Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client
** Table create command sugar that adds the Constraint specification without a lot of user typing necessary?

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying tags in the KV should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around? Edit: By that I mean a pointer to where they are serialized in byte[] if doing it lazily as you suggest.
                
      was (Author: apurtell):
    Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client
** Table create command sugar that adds the Constraint specification without a lot of user typing necessary?

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying KVs should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around? Edit: By that I mean a pointer to where they are serialized in byte[] if doing it lazily as you suggest.
                  
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393625#comment-13393625 ] 

Matt Corgan commented on HBASE-6222:
------------------------------------

{quote}Worth doing a bake-off?{quote}
If we didn't want to wait for the KV interface development to percolate out to the client code (could take a while), then we'd have to add the tag info to the current KeyValue via Lars's suggestion or something like it so it can work with existing mechanisms like the memstore, client, filters, etc.  The upside is that we wouldn't have to be super-paranoid about adding bloat to the KeyValue since it will someday become just one of the cell implementations.  I guess I'm saying it's maybe ok to muck up the current KV even more given that data block encoding should be able to clean up the mess down the road.  That being said, I don't personally need this feature so I hate to suggest mucking up anything!
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393622#comment-13393622 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Jon, Stack: No we don't have to mimic the Accumulo API though if the goal here is to be an alternative, it must be possible to build a direct API translation shim that provides the same labelling and visibility semantisc.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Jerry Chen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483960#comment-13483960 ] 

Jerry Chen commented on HBASE-6222:
-----------------------------------

@Jon

bq. From my point of view, I'd like really like to understand more than just accumulo's implementation – I really care about if accumulo's semantics are 1) intentional and required for accumulo use cases and 2) if applications only use a constrained sets of its capabilities. One specific thing I don't quite understand is the ramifications of having column visibility settings are encoded as part of the key and sort order. This could be equivalent expressions that are no longer equals, and some of somewhat goofy future/past views.  

I looked into the accumulo implementation. And one thing that accumulo want to achieve by making the ColumnVisibility as a part of the column key is that the authorization can be enforced without reading the existing records that may be affected by the current mutation. Because the ColumnVisibility is part of the key, you need to explicitly give the ColumnVisibility to identify/match to the column you are targeting to change (put or delete), and thus VisibilityConstraint check can be performed on the given ColumnVisibility with the user's authorization tokens to make sure the user has been authorized for performing the mutation logically on an existing column, without scanning the existing columns that it may make changes on. HBase and accumulo are at the same situation for this.

There may some other issues to be addressed if something is not part of the key and also not part of the value when multiple versions of a logical column existed. For example, a Put with new Visibility values of the key of {row1, family1:qualifier1} will make an logical changes on all the cells with key {row1, family1:qualifier1}, and thus authorization must be checked over all these affected items (which may with different Visibility values) with the user authorization to see whether the Put can be performed or not. And the deletion gets the same thing to consider when considering DeleteFamily and DeleteColumn which logically affect a lot of columns that may have different Visibility values).

One important change on the client API when the Visibility is part of the column key is the Visibility need to be specified either explicitly or implicitly (such as empty Visibility is used when no Visibility provided in the parameters) when performing Put or Delete mutations. This does seem a little strange at a first glance when comparing with approaches used by traditional database row level authorization such as Oracle Label Security. But the question is do we have other better choices both solve the problem and fit into the current framework?

                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393360#comment-13393360 ] 

Lars George commented on HBASE-6222:
------------------------------------

I am thinking of moving the tags between the key and the value of the KV. The existence would need to be tracked somehow, so we could add a single byte as a bit field to flag the presence. Or add a new type value like PUT_WITH_TAGS or some such. That would leave existing KVs the same and only the code needs to be enhanced to enable the actual security layer - which would be a system level coprocessor I presume. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393380#comment-13393380 ] 

Matt Corgan commented on HBASE-6222:
------------------------------------

DataBlockEncoding gives us the opportunity to add features to a KeyValue interface but create different implementations that have the option of ignoring some of the features.  For HBASE-4676 (PrefixTrie), I need to create an interface so that I can store the different parts of a KeyValue in different backing arrays: https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/cell/HCell.java

This discussion reminds me of the memstoreTS in the current KeyValue which is more of an afterthought and has negative performance implications.  With DataBlockEncoding I'm able to treat it as a first class citizen when it is needed and encode it away when it's not needed.  I suppose there is a small cpu cost, but really small relative to what else is happening.

In the future, I think we're going to want to treat the legacy KeyValue and legacy LinkedList<KeyValue> style data blocks as just another DataBlockEncoding (DataBlockEncoding.NONE), and instead do most operations through a KeyValue interface.  After a lot of benchmarking and research on compiler optimizations, I'm pretty sure we are far away from any performance issues with an interface if people are worried about that... i can comment more on that later if anyone wants.
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494519#comment-13494519 ] 

Himanshu Vashishtha commented on HBASE-6222:
--------------------------------------------

[~apurtell]Thanks for the writeup. Yes, your approach is different. Few thoughts:

a) Its not clear how/where you are storing the acls at keyvalue level. You use _acl_ table, or something else? Reading through this gives an idea that the keyvalue level acls have null table, cf and cq attribute with them. Does this mean a user can have only one type of KV level acls in the application. Or in other words, how does the KV acl looks like in the _acl_ table?

b) Currently, all the acl entries are stored in zk (limit of znode is 1 mb); will you be using the same approach?

bq. It doesn't make sense to attach an ACL on a Delete, because the KVs covered by the delete will be ... deleted.
I don't completely agree with this but will not comment also unless I completely understand your approach.

[~yuzhihong@gmail.com] Thanks for taking time to read through it. 
bq.     Versions will be kept for each unique visibility expression.Would this inflate memstore because we are keeping potentially many more versions of KeyValue which differ by visibility expression only ?

Or HFile, right? If so, yes, as different visibility expressions provides different access control to user. If user A does two Put with Visibility A|B, C and then does a delete with visibility A|B, another user with acl of C should be able to see this.

re: ENABLE_CELL_LEVEL_SECURITY
I initially visualized a table level (will give some more thought about CF level... as mentioned in the doc also)
Yes, change it after disabling the table as it will flush out the memstore.

re: Multiple CVFilter
No, only one should be good enough. How come you got this idea? I should fix the doc.

re: Typo: 
Yes, he will get v2 and v4. Thanks for pointing this out.


                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397122#comment-13397122 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

bq. At a high level it seems odd for the co-processor to constrain what can be seen – we definitely would not want to let a "normal" client view the raw underlaying tags or visibility metadata tables!

There would not be a "normal" client in this case, the coprocessor would obviously be installed as a system level coprocessor and the visibility filter would therefore always be active. Didn't mean to imply otherwise somehow. Certainly that's not how I read the comment stream above.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484279#comment-13484279 ] 

stack commented on HBASE-6222:
------------------------------

[~gqchen] A put that broadened visibility would be for the current put only?  How would it effect already-put values?  (I can see how an Increment would have to check current visibility setting to see if it was allowed increment it -- ditto for check and put)  The DeleteFamily, DeleteColumn case is interesting.  HBase would have to change how it does these operation pretty radically -- at least the way we do delete family.  [~larsgeorge] Anything to report?  It has to be per KV?  Or could it be metacolumn/Oracle Label Security-like?
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296184#comment-13296184 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Jon we could do it that way but then the changes are core  . A big concern when This first came up is the impact increasing the size of every KV would have for something that won't be the common case. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Eric Newton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397665#comment-13397665 ] 

Eric Newton commented on HBASE-6222:
------------------------------------

For clarification, Accumulo does not support negation in the visibility expressions.  "!" may be reserved for future use, but expressions are limited to terms, "&" or "|" and parens for precedence.

Visibility expressions are not required: the metadata table does not use them, for example.

The visibility needs to be part of the key in order to have different values at different visibilities.

@Jonathan - enjoyed your talk @hadoop summit.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296177#comment-13296177 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

HBASE-4477 could transmit tags in the WAL. 

HBASE-2893 Is one way to manage them in the store. 

Revisit those issues?

Others? New approaches?
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393496#comment-13393496 ] 

Lars George commented on HBASE-6222:
------------------------------------

@Matt If you need to add this anyways, then I would prefer to do this for your problem as well as this one. This kills two issues with one stone. It would also address the future extensibility that adding the KV interface was trying to address anyways.

Have you had a chance to go through Todd's attempt and see if you were able to resolve all of the concerns there? As Stack says, solve the "ripple effect"?

I like this (or my original proposal) best compared to adding meta columns, which seem to also call for a fundamental rearchitecture to add locality groups proper. Feel free to convince me otherwise :)
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393374#comment-13393374 ] 

Andrew Purtell edited comment on HBASE-6222 at 6/16/12 7:33 PM:
----------------------------------------------------------------

Thus far we have various hooks and a proposal for an out of band approach: HBASE-2893, HBASE-4477, API object attributes (HBASE-3811, HBASE-3931, etc.). The impetus (as I understand it) for an out of band approach is the special nature of KV and the desire to avoid adding additional conditionals/complexity in code wherever it touches KV. That is balanced by the problems an out of band approach causes for minimizing overheads when tags are present, as Jon mentions, without locality groups (a significant change) there is extra IO. I see this as a more conservative approach that won't perturb performance sensitive members of our community, because there is zero overhead unless configuration is toggled on, coprocessors are loaded, and such.

However I am not arguing for or against any particular approach. The suggestions by Jon, Lars, and Jimmy are the start of a design proposal but should be pulled together into an actual proposal (as another JIRA?). This is my point. Adding even only one byte to every KV has a very significant effect on storage and IO when the data set is large. Adding a new type avoids that but what is the impact in the code? And it wouldn't be one new type right, there would be a tagged analogue for every mutator? A convincing exploration of these issues would go a long way here IMHO.
                
      was (Author: apurtell):
    Thus far we have various hooks and a proposal for an out of band approach: HBASE-2893, HBASE-4477, API object attributes (HBASE-3811, HBASE-3931, etc.). The impetus (as I understand it) for an out of band approach is the special nature of KV and the desire to avoid adding additional conditionals/complexity in code wherever it touches KV. That is balanced by the performance impact of an out of band approach, as Jon mentions, without locality groups (a significant change) there is extra IO. I see this as a more conservative approach that won't perturb performance sensitive members of our community, because there is zero overhead unless configuration is toggled on, coprocessors are loaded, and such.

However I am not arguing for or against any particular approach. The suggestions by Jon, Lars, and Jimmy are the start of a design proposal but should be pulled together into an actual proposal (as another JIRA?). This is my point. Adding even only one byte to every KV has a very significant effect on storage and IO when the data set is large. Adding a new type avoids that but what is the impact in the code? And it wouldn't be one new type right, there would be a tagged analogue for every mutator? A convincing exploration of these issues would go a long way here IMHO.
                  
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393623#comment-13393623 ] 

stack commented on HBASE-6222:
------------------------------

bq. I would assume the latter, a map of key values for each tag.

This strikes me as odd; keyvalues within the keyvalues (and then keyvalues on the keyvalues in keyvalue) and then along comes Matt w/ his "why isn't everything in the KV implemented as a tag?" which is a reasonable question.  A response would be because we don't want to have to have exotic deserialization and sort but I suppose with Trie we might have that already?  It'd certainly be 'flexible' which is what you want LarsG? (A nice objective that usually comes at performance cost).  A core of required's with optional tags that don't cost unless you use them would be grand.

bq.  I think the data is so important, it should be part of the KV itself.

Good point.  Maybe not even lost, mayhaps a bug would cause us skip the metacolumn?

bq. ....attaching the tag bag after the key.

This seems like it would not require much of a refactor (would have to somehow mark where key ends and tag/bag starts).



                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400209#comment-13400209 ] 

Lars George commented on HBASE-6222:
------------------------------------

As a note, I raised the question of what we need as per the use-cases we see at customers. We are in the process of evaluating this and I should be able to share the requirements here sometime soon. 

One of the questions is where we put policies: either in the labels and therefore the tags on each KV, or in a separate place that applies dynamic runtime policies that match the user to the data labels. 
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393356#comment-13393356 ] 

Jonathan Hsieh edited comment on HBASE-6222 at 6/16/12 5:59 PM:
----------------------------------------------------------------

@Andy I'm not proposing we necessarily do it the way accumulo does it -- the more important point is that we mimic it's semantics to make it easier for users to eventually port to HBase. :) 

The meta column idea you describe in HBASE-2893 and its use cases seems to also argue for  restoring the distinction between locality groups and column families from original bigtable paper. (these are conflated in HBase).  

Using a meta column for cell-granularity visibility seems like a reasonable hbase implementation option.  I'm not convinced why this approach would save significantly more space.  I do think  the not changing core argument is more compelling hbase-specific constraint.

The accumulo implementation has a default case that would make it cost the tags cost extra only on cells with non-default behavior (e.g. only the highly-sensitive or cols with sensitivity varying cells).  They delta encode their keys (including the visiblity tags) like we do in 0.94.

As a straw man, a core-changing HBase version that modified KV's wouldn't necessarily incur more space -- we'd could a new KeyValue.Type (TaggedPut?) that had visibility settings, and use the old Put type that used the default tags. 
                
      was (Author: jmhsieh):
    @Andy I'm not proposing we necessarily do it the way accumulo does it -- the more important point is that we mimic it's semantics to make it easier for users to eventually port to HBase. :) 

The meta column idea you describe in HBASE-2893 and its use cases seems to also argue for  restoring the distinction between locality groups and column families from original bigtable paper. (these are conflated in HBase).  

Using a meta column for cell-granularity visibility seems like a reasonable hbase implementation option.  I'm not convinced why this approach would save significantly more space.  I do think the not changing the not changing core argument is more compelling constriant.

The accumulo implementation has a default case that would make it cost the tags cost extra only on cells with non-default behavior (e.g. only the highly-sensitive or cols with sensitivity varying cells).  They delta encode their keys (including the visiblity tags) like we do in 0.94.

As a straw man, a core-changing HBase version that modified KV's wouldn't necessarily incur more space -- we'd could a new KeyValue.Type (TaggedPut?) that had visibility settings, and use the old Put type that used the default tags. 
                  
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393374#comment-13393374 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

Thus far we have various hooks and a proposal for an out of band approach: HBASE-2893, HBASE-4477, API object attributes (HBASE-3811, HBASE-3931, etc.). The impetus (as I understand it) for an out of band approach is the special nature of KV and the desire to avoid adding additional conditionals/complexity in code wherever it touches KV. That is balanced by the performance impact of an out of band approach, as Jon mentions, without locality groups (a significant change) there is extra IO. I see this as a more conservative approach that won't perturb performance sensitive members of our community, because there is zero overhead unless configuration is toggled on, coprocessors are loaded, and such.

However I am not arguing for or against any particular approach. The suggestions by Jon, Lars, and Jimmy are the start of a design proposal but should be pulled together into an actual proposal (as another JIRA?). This is my point. Adding even only one byte to every KV has a very significant effect on storage and IO when the data set is large. Adding a new type avoids that but what is the impact in the code? And it wouldn't be one new type right, there would be a tagged analogue for every mutator? A convincing exploration of these issues would go a long way here IMHO.
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397121#comment-13397121 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Jon You just argued for a metacolumn, no?
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493387#comment-13493387 ] 

Ted Yu commented on HBASE-6222:
-------------------------------

@Himanshu:
I have some questions about your design doc.
bq. Versions will be kept for each unique visibility expression.
Would this inflate memstore because we are keeping potentially many more versions of KeyValue which differ by visibility expression only ?

bq. HTable-level property: a property “ENABLE_CELL_LEVEL_SECURITY” in
Since HFile metadata would include similar information, looks like storing such information at column family level is better.

bq. The property “ENABLE_CELL_LEVEL_SECURITY” can be change only after enabling/disabling the table.
I assume that the table is in disabled state when this property is changed.

on page 12:
bq. It attaches the CVFilter to the Get object,
Suppose a user belongs to more than one group, would multiple CVFilter instances be attached to the Get object ?

bq. it passes only the ones which pass the “Secret” visibility expression.He will get v1 and v4.
The above is inconsistent with description on page 11 where v2 and v4 are said to be returned.

Thanks
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393589#comment-13393589 ] 

Matt Corgan commented on HBASE-6222:
------------------------------------

{quote}@Matt So attributes carry in information and we use a new encoding type to store tags adjacent to KVs if they are labelled?{quote}
Yeah, basically you would be required to use the new encoding format (TaggedPrefixTrie, for example), and if you switched back to DataBlockEncoding.NONE, your tags would be erased (we'd want to make that scenario appropriately difficult of course.).  The trie would de-duplicate them, take up no space if they aren't there, and use >=1 byte per cell index if they are there.

{quote}Have you had a chance to go through Todd's attempt and see if you were able to resolve all of the concerns there?{quote}
I was aware Todd tried this at some point but haven't looked into the details or talked to him about it.  I assume  he was attempting a full "extract interface" style refactoring, which I think is probably too big of a change.  My approach is a little different and sets in place a strategy for gradually migrating to the interface.  From the top of the HCell class linked to above, you can see these methods that require any implementation to be able to create legacy KeyValues.
{code}
//V1 is the byte[] format of the current KeyValue implementation.
//for building a result to send to client
int getV1KeyLength();
int getV1Length();
byte[] getV1KeyValueArray();
KeyValue getKeyValue();
ByteBuffer getV1KeyBuffer();

int appendKeyToV1ByteArray(byte[] output, int offset);
int appendToV1ByteArray(byte[] output, int offset);
{code}
This lets the interface start in the data block encoders and slowly grow out into the storefiles, filters, etc, and eventually out to the client.  Eventually we deprecate those methods from the interface and try to finalize the transition.  One problem with this approach with regard to the tags is that only the storage layer would know about them at first, and it would take some time to expose them to the client.

All-in-all, the tags raise some interesting questions in my mind like "why isn't everything in the KV implemented as a tag?"  For example, i'd love to see per-cell TTL's someday.  Maybe tags go hand-in-hand with co-processors to implement features like that?
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296117#comment-13296117 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

It's rather shocking to see congressional level discussion about  NoSQL winners and losers. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396190#comment-13396190 ] 

Keith Turner commented on HBASE-6222:
-------------------------------------

The security label in the Accumulo key is called Column Visibility.  Its part of the sort order and uniqueness of an 
Accumulo key.  The key is sorted on row, then column fam, column qual, then column vis, then timestamp, then delete flag.  So for example the following entries would be distinct and sort in the following way.

row=1234 cf=name cq=last cv=S&GOLF ts=5 val=Smith
row=1234 cf=name cq=last cv=S&GOLF ts=4 val=Smit
row=1234 cf=name cq=last cv=U ts=9 val=Doe

                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396197#comment-13396197 ] 

Lars George commented on HBASE-6222:
------------------------------------

A comparable concept in the RDBMS world is Oracle's label security:

http://www.oracle.com/us/products/database/options/label-security/overview/index.html

This is much more involved as it has labels, departments, and so on. It also combines default labels with runtime ones, making up the actual value to check against. The data labels are stored in a hidden column in the table and is for the entire row.

Accumulo stores the labels with the policy in each KV, for example "A&B&(D|E)". The system level filter evaluates the labels and compares them to the actual authorization details of the user. It then let's the user access the data or not. So this is simpler compared to OLS.

I was thinking that adding "tags" is the actual support feature to enable the same functionality. Then we need a coprocessor to apply the rules. The part we do not have here is the authorization against the labels. The labels in Accumulo are created ad-hoc, which means we would need to attach the user authorization in the ACL table, but that can be cached. 
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393618#comment-13393618 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Matt

{quote}
bq. @Matt So attributes carry in information and we use a new encoding type to store tags adjacent to KVs if they are labelled?

Yeah, basically you would be required to use the new encoding format (TaggedPrefixTrie, for example), and if you switched back to DataBlockEncoding.NONE, your tags would be erased (we'd want to make that scenario appropriately difficult of course.). The trie would de-duplicate them, take up no space if they aren't there, and use >=1 byte per cell index if they are there.
{quote}

So we may have this as a way to store tags inline with data, with dedup/optimize away if not needed; and we may have Lars' somehow tag structure addition to KV (Lars: what would that look like?). Worth doing a bake-off? 
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296151#comment-13296151 ] 

Lars George commented on HBASE-6222:
------------------------------------

I was thinking about this as well a while back when Accumulo came up. In that context I thought it might be good to have the option to add "tags" to each KV. This can be used for security or other purposes, like replication or transactional support. Another use case is filtering as these tags could have special features like caching or indexes. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-6222:
-------------------------------------

    Assignee: Andrew Purtell
    
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Laxman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393676#comment-13393676 ] 

Laxman commented on HBASE-6222:
-------------------------------

Good to see lot of interest and discussion in this.

I couldn't understand the above discussion completely.
But, I've some basic questions.
1) Do we really have any known use-cases for KV based access control? 
2) Does scalability affected badly for the users who needs at KV level access control?

How about other approach of supporting access control through *HBase views*?
If we want to restrict access for a CF/CQ and a set of rows to a user/group then create a view for that set and control access through view.

It's just an idea based on my understanding of RDBMS (Oracle). Not really sure if this is really feasible in HBase. Please respond with use-cases and comments.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393617#comment-13393617 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Lars

I was following up to this point:
{quote}
I like this (or my original proposal) best compared to adding meta columns, which seem to also call for a fundamental rearchitecture to add locality groups proper.
{quote}

That's not _essential_ to storing labels in a metacolumn, though it may be advisable for performance reasons.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393492#comment-13393492 ] 

Lars George commented on HBASE-6222:
------------------------------------

bq.  If we add a ColumnVisibility-like expression into the KV, we'd have to update Comparators to exclude this portion from inclusion in sort.

That is the whole point of my proposal up top, using the new type and attaching the tag bag *after* the key, the default code path stays untouched. 

Sure this needs some extra conditional statements, but is easy to test for. The rest of the tests need no overhaul. All the other approaches strike me as overkill - although they might be worthwhile in their own right. So the question is, what weapon do we choose, so that we solve one problem and hopefully get the most collateral benefit?

Converting KVs into an interface and subclassing them, as you say, seems as well as a major overhaul of the code base. Do we really want this right now? And what does it give us besides the basis for the problem at hand?
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397057#comment-13397057 ] 

Jonathan Hsieh commented on HBASE-6222:
---------------------------------------

>From my point of view, I'd like really like to understand more than just accumulo's implementation -- I really care about if accumulo's semantics are 1) intentional and required for accumulo use cases and 2) if applications only use a constrained sets of its capabilities.  One specific thing I don't quite understand is the ramifications of having column visibility settings are encoded as part of the key and sort order.  This could be equivalent expressions that are no longer equals, and some of somewhat goofy future/past views.

Another thought: At a high level it seems odd for the co-processor to constrain what can be seen -- we definitely would not want to let a "normal" client view the raw underlaying tags or visibility metadata tables!

IMO I'd probably prefer a completely generic tag system for HBase only after we have a few different serious use cases (possibly existing ones!) that would use it.  Making something overly generic introduces its own set of new problems.

@Keith I'll search on the accumulo mailing to see if things will answer my semantics questions -- and if I don't find it I'll shoot off some questions there.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396972#comment-13396972 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client
** Table create command sugar that adds the Constraint specification without a lot of user typing necessary?

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying KVs should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around?
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393697#comment-13393697 ] 

Lars George commented on HBASE-6222:
------------------------------------

bq. @Andy That's not essential to storing labels in a metacolumn, though it may be advisable for performance reasons.

Understood. I am not saying this is needed or that metacolumns do not work without. In fact, I think that they are very useful in the context you discussed with Matt, i.e. for example TTLs. I personally think there is a need for two optional features: a) metacolumns - which cover broader rules for a many columns or rows, and b) KV tags - which are carried as low as they can get to retain per cell information. 

So for TTL I would think that the tags are too low, yet for security I do think that metacolumns are too "weak" of a guarantee.

bq. @Andy and @Matt: So we may have this as a way to store tags inline with data, with dedup/optimize away if not needed; and we may have Lars' somehow tag structure addition to KV (Lars: what would that look like?). Worth doing a bake-off?

I think this is not either or, but - and Matt please correct me if mistaken - if we add Trie compression then we can leverage the implementation to handle it. If we decide not to merge the two, then we can use my suggestion of adding them to the KV optionally and we can handle the compression implications later.

bq. @Andy: We could agree on criteria such as: Tag storage optimized out if no tags present

Indeed, since we use a new type, no extra storage is needed if no tag is attached.

bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).

bq. @Andy: Generic mechanism for adding, reading, removing, and modifying tags, usable by coprocessors.

These are the KeyValue.addTag(byte[] name, byte[] value) and KeyValue.getTag(byte[] name) helpers I was referring to. The coprocessors has full access that way, since the tags are carried for each KV.

bq. @Andy: No we don't have to mimic the Accumulo API though if the goal here is to be an alternative, it must be possible to build a direct API translation shim that provides the same labelling and visibility semantisc.

Indeed. One of the arguments I hear comparing HBase and Accumulo is the fact that we have no cell level security tagging. That is what this is all about. My proposal is - as much as I can tell - lean (as it uses no extra storage if not used), can be combined with the non-cell level security (you might not want this level of security to avoid extra baggage), does not change the comparators, and overall is quite non-intrusive in existing code. On the other hand it seems useful for other cell level features in the future.

As Jon says, Accumulo uses these tags and the always-on filter to achieve security (on a very high level view), and so can we then. For me this is comparable then. We do not need to comply to the entire API, but feature set level only.

bq. @stack: A core of required's with optional tags that don't cost unless you use them would be grand. 

That is exactly my point. As for "KV in KV", I do not see how this is "odd" as our KeyValue for starters is the odd one given what most people understand of what a KV is. Coming to terms with our complex key and various sorting rules is not trivial.

bq. @stack: Good point. Maybe not even lost, mayhaps a bug would cause us skip the metacolumn?

Spot on!

bq. @Matt: I guess I'm saying it's maybe ok to muck up the current KV even more given that data block encoding should be able to clean up the mess down the road. That being said, I don't personally need this feature so I hate to suggest mucking up anything!

Agreed, this is about timing as well. Your patch is highly intrusive - but for good reasons. So I would love to discuss this current issue with your changes already applied. But on the other hand we have to make a call for what we want and when?

@Laxman: The basic premise here is to be on-par security wise with Accumulo. That is the use-case. As for scalability, I do not see why a few extra bytes and a coprocessor that checks them is disastrous. Sure, this needs evaluation, but we know that other systems - like Accumulo - does it, so if someone wants to enable it, they should see the same impact. Small or big. Or asking the other way around, where do you see this could affect the performance?

bq. How about other approach of supporting access control through HBase views?

The issue is that these are typically only on the row level. With the cell level you can filter as fine grained as possible. Views - and please object if I am wrong - are more coarse grained. Think of blocking access to some columns differently across many rows. Not just all CF/CQs allowed for all rows.

That latter is the crucial difference of what is needed to be on-par.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13305339#comment-13305339 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Jimmy Such a proposal must address the impact on code. So is that a subclass of KV? Or a bunch of conditional code wherever KVs are handled?
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296182#comment-13296182 ] 

Jonathan Hsieh commented on HBASE-6222:
---------------------------------------

Here's a good link with high level semantics of their implementation.  

http://accumulo.apache.org/1.4/user_manual/Security.html

I've spent a few hours looking at the accumulo code.  Internally it is:
* an extra visibility/tags field in their cell Key which supports AND, OR, NOT boolean operators.
* an always on visibility filter on the tablet/regionserver side.
* a constraint policy option that determines behavior if a users attempts to write to labels they do not have read access to:
** allow
** throw security exceptions
* authentication credentials and authorization plumbing



                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395834#comment-13395834 ] 

Anoop Sam John commented on HBASE-6222:
---------------------------------------

I dont think the approach that we are discussing is, to keep the ACL details of each cell in the _acl_ table. So that wont be a concern as such
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393377#comment-13393377 ] 

Lars George commented on HBASE-6222:
------------------------------------

I agree with the need of discussing what the objective is and how to get there with the least amount of impact on existing code. That is why I like the idea of reusing the type field. 

I would only add one more type (the one mentioned) and that, together with a global switch, would trigger the code to parse the tags located after the key but before the value (or maybe after the value as it is length prefixed). That tags once parsed can be access then by the coprocessor to do the filtering etc.

So we would need a way to attach tags to a KV as well as access them once they are parsed out. Seems reasonable?
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494755#comment-13494755 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

I forgot to mention something and want to highlight something.

The new KVs for Append and Increment inherit the KV tags (hence ACLs) of the value(s) they are updating. This is both logical and works around the fact that Increment is not a Mutation (see HBASE-7114).

If [~v.himanshu] or someone else wants to implement Accumulo-style visibility labels (or if I ultimately end up doing it), then I encourage following the same design principles:
- Coprocessor based implementation
- Minimal to no changes to core code
- Perhaps just building on a KV generic tags facility
- Use OperationWithAttributes#{get,set}Attribute for passing through your new metadata

Then, you can see how what I describe above and perhaps something else that implements Accumulo-style visibility labels can be consistent in deployment and API details *and can be easily stacked on top of each other*.

                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393395#comment-13393395 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

@Matt So attributes carry in information and we use a new encoding type to store tags adjacent to KVs if they are labelled?

@Jon For counters definitely not 1, option 2 seems better. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Laxman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395815#comment-13395815 ] 

Laxman commented on HBASE-6222:
-------------------------------

bq. The basic premise here is to be on-par security wise with Accumulo. That is the use-case.

IMHO, that's one implementation but not use-case. Definitely, Accumulo would have some straight use-case. Do we that use-case? Based on use-case, we can brainstorm on different approaches (KV level, Views, something else may be).

bq. where do you see this could affect the performance?

I have following concern w.r.to *scalability*.

* With current implementation, ACLs are cached. With cell level, it may grow heavily.
* Please take a look @AccessController.permissionGranted method. We need to call this method(+some more checks for KV based) for every KV. This may become a hotspot when we introduce KV based access control.

We are currently evaluating performance with security enabled. Soon, I will share our report.

bq. Think of blocking access to some columns differently across many rows.
I agree. Can you please explain how do we solve this with a traditional RDBMS like Oracle. 

*Note:* I definitely don't want to bring up the well known discussion "SQL vs NOSQL" here and I'm only trying to understand the use-case as a HBase user/developer. Only point I want to put forward is we should have proper understanding of use-case and user before we start on a approach/solution.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393390#comment-13393390 ] 

Jonathan Hsieh commented on HBASE-6222:
---------------------------------------

Do we think of the goal of faithfully mimicking accumulo's is reasonable?  (my first impression is yes). If so, I think we'd need to define those semantics and to do that we'd need bring up an accumulo cluster (or spend more time in the code) to see what its semantics are.  

Here's the accumulo api.  (ColumnVisibility is tag expression)
http://accumulo.apache.org/1.4/apidocs/org/apache/accumulo/core/data/Mutation.html

You can see that delete mutations in accumulo also have visibility expressions -- and I'm not actually sure what the semantics are.  Let's say you have a "public" put on row A, and then a "sensitive" delete of the entire row A.  Does someone with the only the "public" authorizations see the old row A, or no row A?  If it was no row A, would they be able to do a versioned query and get the old row A (and could then infer out that there was a sensitive delete?)...

Also, I didn't see atomic increment operator in accumulo.  If it doesn't exist we'd have to define that behavior. Some options:
1) No visibility settings on hbase increment mutation operations, thus only keep only one value always readable by all
2) Add some mechanism to set counter visibility so we could to require authorization to read. 
3) Keep one counter for each visibility tag on a particular cell... (yuck).
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-6222:
----------------------------------

    Summary: Add per-KeyValue Security  (was: Add per-KeyValue Security, get federal funding for HBase?)
    
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393488#comment-13393488 ] 

Lars George commented on HBASE-6222:
------------------------------------

bq. @Lars When you say "...I thought it might be good to have the option to add "tags" to each KV", what would a tag look like? A bit set? Or some name/values within the KV?

I would assume the latter, a map of key values for each tag. 

Please note that I would like for these tags NOT just being used for security, but also as something we can use in a broader sense. Like the per KV memstoreTS idea. They can be optionally attached to the KVs when needed. The approach does not change the existing KVs nor any comparator code. Maybe the above is not a good example as that seems something that is eventually needed for all KVs and would warrant for a more tighten terrain into the key and comparators. But I am hoping you get the idea?

So far the other approaches (if discussed on that level yet at all) are trying to store the security in collateral places, which I think is not good enough to be compared with Accumulo. But I might be wrong. I personally think that having the ACLs outside the KV asks for situations where the info is lost. I think the data is so important, it should be part of the KV itself. Just thinking out loud. 
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495528#comment-13495528 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

bq. If cp fails to load, will all the data then become visible Andrew? 

This is straightforward to address. The CP framework can fail to open a region if required CPs cannot be loaded and initialized. This is HBASE-6873. 

                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396972#comment-13396972 ] 

Andrew Purtell edited comment on HBASE-6222 at 6/19/12 10:34 PM:
-----------------------------------------------------------------

Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying tags in the KV should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around? Edit: By that I mean a pointer to where they are serialized in byte[] if doing it lazily as you suggest.

Edit: Removed one line of language where I went off in the weeds a bit.
                
      was (Author: apurtell):
    Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client
** Table create command sugar that adds the Constraint specification without a lot of user typing necessary?

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying tags in the KV should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around? Edit: By that I mean a pointer to where they are serialized in byte[] if doing it lazily as you suggest.
                  
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Jimmy Xiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296186#comment-13296186 ] 

Jimmy Xiang commented on HBASE-6222:
------------------------------------

We can put some attribute on the table level to control if the feature is enabled for a table.  So that in normal case, the KV is not affected.
We can also put such information in a system table.
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393621#comment-13393621 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

bq. All-in-all, the tags raise some interesting questions in my mind like "why isn't everything in the KV implemented as a tag?" For example, i'd love to see per-cell TTL's someday. Maybe tags go hand-in-hand with co-processors to implement features like that?

The metacolumn idea was actually the result of some brainstorming on how to do per-cell TTLs. By no means do I raise it as a potentially superior alternative, simply one implementation strategy that wouldn't touch core. However tags in KV would be another way to go that touches core only once. 

We could agree on criteria such as:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* Generic mechanism for adding, reading, removing, and modifying tags, usable by coprocessors.

maybe?
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Laxman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395856#comment-13395856 ] 

Laxman commented on HBASE-6222:
-------------------------------

Thanks for clarification Anoop. So, one of the concerns w.r.to cache can be ruled out for further discussion.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492029#comment-13492029 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

I am currently exploring/coding a couple of implementation options in parallel. Please ping me if you're working in this area.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393356#comment-13393356 ] 

Jonathan Hsieh edited comment on HBASE-6222 at 6/16/12 6:00 PM:
----------------------------------------------------------------

@Andy I'm not proposing we necessarily do it the way accumulo does it -- the more important point is that we mimic it's semantics to make it easier for users to eventually port to HBase. :) 

The meta column idea you describe in HBASE-2893 and its use cases seems to also argue for  restoring the distinction between locality groups and column families from original bigtable paper. (these are conflated in HBase).  

Using a meta column for cell-granularity visibility seems like a reasonable hbase implementation option.  I'm not convinced why this approach would save significantly more space.  I do think  the not changing core argument is more compelling hbase-specific constraint.

The accumulo implementation has a default case that would make it cost the tags cost extra only on cells with non-default behavior (e.g. only the highly-sensitive or cols with sensitivity varying cells).  They delta encode their keys (including the visiblity tags) like we do in 0.94.

As a straw man, a core-changing HBase version that modified KV's wouldn't necessarily incur more space -- we could use a new KeyValue.Type (TaggedPut?) that had visibility settings, and use the old Put type that used the default tags. 
                
      was (Author: jmhsieh):
    @Andy I'm not proposing we necessarily do it the way accumulo does it -- the more important point is that we mimic it's semantics to make it easier for users to eventually port to HBase. :) 

The meta column idea you describe in HBASE-2893 and its use cases seems to also argue for  restoring the distinction between locality groups and column families from original bigtable paper. (these are conflated in HBase).  

Using a meta column for cell-granularity visibility seems like a reasonable hbase implementation option.  I'm not convinced why this approach would save significantly more space.  I do think  the not changing core argument is more compelling hbase-specific constraint.

The accumulo implementation has a default case that would make it cost the tags cost extra only on cells with non-default behavior (e.g. only the highly-sensitive or cols with sensitivity varying cells).  They delta encode their keys (including the visiblity tags) like we do in 0.94.

As a straw man, a core-changing HBase version that modified KV's wouldn't necessarily incur more space -- we'd could a new KeyValue.Type (TaggedPut?) that had visibility settings, and use the old Put type that used the default tags. 
                  
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396972#comment-13396972 ] 

Andrew Purtell edited comment on HBASE-6222 at 6/19/12 6:33 PM:
----------------------------------------------------------------

Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client
** Table create command sugar that adds the Constraint specification without a lot of user typing necessary?

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying KVs should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around? Edit: By that I mean a pointer to where they are serialized in byte[] if doing it lazily as you suggest.
                
      was (Author: apurtell):
    Correct me if I'm wrong but given all of the above discussion therefore the scope of changes necessary are:

* Add a facility for storing string tags with each KV. Will need to do this for both puts and deletes.

* Update the KV comparators to take into account tags in the sort order, if present.

* Add a coprocessor (perhaps another implementation of Constraint) that:
** Implements a visibility filter that evaluates for each KV if the current set of user-supplied labels intersects with allowed labels according to visibility predicates stored in the KV, if any, and emits (or not);
** Unpacks labels added by the client via our generic op attributes facility and adds them to KVs in a mutator op or programs the visibility filter on a Get or Scan;
** Always adds the visibility filter to Get and Scan operations, wrapping any other supplied filter with FilterList(MUST_PASS_ALL) and the visibility filter on the RHS so we can let the other filters reduce the number of visibility evaluations necessary.

* Shell support
** Add setuaths and getauths equivalent shell commands
** Extend the other shell ops with sugar for adding labels as a working example of how to do it via the Java client
** Table create command sugar that adds the Constraint specification without a lot of user typing necessary?

And the constraints on implementation are:

* Tag storage optimized out if no tags present

* Compartmentalized changes

* The mechanism for adding, reading, removing, and modifying KVs should be generic

Does that about sum it up?

@Laxman: Such a Constraint-based coprocessor as described would be orthogonal to AccessController. One could use both at the same time independently, or they could be unified at some point. 

@Lars
{quote}
bq. @Andy: Compartmentalized changes

Agreed, we add a new type and handle that case separately. Though the majority of the code is shared, the new type would trigger the extraction of the tags if called for (which I assume would be done lazily).
{quote}

So one new type that causes us to extract tags then recurse on the actual wrapped op? And we extend all necessary internal method signatures to carry tags around?
                  
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393374#comment-13393374 ] 

Andrew Purtell edited comment on HBASE-6222 at 6/16/12 7:38 PM:
----------------------------------------------------------------

Thus far we have various hooks and a proposal for an out of band approach: HBASE-2893, HBASE-4477, API object attributes (HBASE-3811, HBASE-3921, HBASE-3931, etc.). The impetus (as I understand it) for an out of band approach is the special nature of KV and the desire to avoid adding additional conditionals/complexity in code wherever it touches KV. That is balanced by the problems an out of band approach causes for minimizing overheads when tags are present, as Jon mentions, without locality groups (a significant change) there is extra IO. I see this as a more conservative approach that won't perturb performance sensitive members of our community, because there is zero overhead unless configuration is toggled on, coprocessors are loaded, and such.

However I am not arguing for or against any particular approach. The suggestions by Jon, Lars, and Jimmy are the start of a design proposal but should be pulled together into an actual proposal (as another JIRA?). This is my point. Adding even only one byte to every KV has a very significant effect on storage and IO when the data set is large. Adding a new type avoids that but what is the impact in the code? And it wouldn't be one new type right, there would be a tagged analogue for every mutator? A convincing exploration of these issues would go a long way here IMHO.
                
      was (Author: apurtell):
    Thus far we have various hooks and a proposal for an out of band approach: HBASE-2893, HBASE-4477, API object attributes (HBASE-3811, HBASE-3931, etc.). The impetus (as I understand it) for an out of band approach is the special nature of KV and the desire to avoid adding additional conditionals/complexity in code wherever it touches KV. That is balanced by the problems an out of band approach causes for minimizing overheads when tags are present, as Jon mentions, without locality groups (a significant change) there is extra IO. I see this as a more conservative approach that won't perturb performance sensitive members of our community, because there is zero overhead unless configuration is toggled on, coprocessors are loaded, and such.

However I am not arguing for or against any particular approach. The suggestions by Jon, Lars, and Jimmy are the start of a design proposal but should be pulled together into an actual proposal (as another JIRA?). This is my point. Adding even only one byte to every KV has a very significant effect on storage and IO when the data set is large. Adding a new type avoids that but what is the impact in the code? And it wouldn't be one new type right, there would be a tagged analogue for every mutator? A convincing exploration of these issues would go a long way here IMHO.
                  
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393450#comment-13393450 ] 

stack commented on HBASE-6222:
------------------------------

@Lars When you say "...I thought it might be good to have the option to add "tags" to each KV", what would a tag look like?  A bit set?  Or some name/values within the KV?

I like Jon's suggestion that HBASE-2893 could be less onerous if we implemented Locality groups (the meta-cf would be co-mingled w/ the data its meta for).

bq. They delta encode their keys (including the visiblity tags) like we do in 0.94.

Should we turn the above on as default?

bq. As a straw man, a core-changing HBase version that modified KV's wouldn't necessarily incur more space – we could use a new KeyValue.Type (TaggedPut?) that had visibility settings, and use the old Put type that used the default tags.

KVs are also versioned so we should be able to up the version and add new in-key facility (A while back I messed w/ doing this adding a sequenceid beside the timestamp... I did it as "conditional code" rather than as subclass which for sure compounded the complexity of the already-complex KeyValue -- unfortunately, hopefully there is a better route -- but it seemed possible making different KV types float in same scanner merge....).  If we add a ColumnVisibility-like expression into the KV, we'd have to update Comparators to exclude this portion from inclusion in sort.

On making KV an Interface, that'd be cool.  Todd had a go at it a while back but the ripple turned into an avalanche of code changes IIRC so he suggested we do it piecemeal.  It'd be sweet though.

+1 on Andrew's proposal first.  We need a driver?

On faithful mimicking of Accumulo's implementation, that makes sense in the absence of an actual customer who needs this stuff (Jon? You have one?)?  Maybe the Accumulo implementation is far from what is wanted -- these 'expression's are passed in the clear -- and a superior implementation might not be that much of a stretch? 
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492608#comment-13492608 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

Thanks for attaching the design documents. I could comment on them but am pursing a substantially different direction.

Regarding KV tags:

Right now I'm experimenting with code that adds tags to KVs in a backwards compatible way. I would call this a transitional change. We overload the existing KV binary format, technically the tags are prepended to value data but are not considered part of it. For KVs that contain tags, the value length is encoded as a negative integer. When a negative value length is encountered, we walk over the value part of the KV, parsing out tags, until we reach an end marker. Once we know the size of the tag data, we adjust the value offset and value length returned to API callers. The parsing of tags in the KV is cached after the first time it is performed. Thus no change to KV handling has to happen outside of the KeyValue class.

Regarding cell level permissions:

Instead of visibility labels we extend the AccessController to support per cell ACLs. These are serialized TablePermissions stored with a KV in a tag. We add a helper class UserTablePermissions which supports both PB and Writable serialization of Listmultimap<String,TablePermission>. The permissions checking of the AccessController is extended in a straightforward way to check if any KV passing through the AccessControlFilter contains tags containing ACLs. If so, an access grant check is first performed using TablePermissions deserialized from the tag(s). This allows per cell ACLs to overload CF or table level permissions wherever they might exist. While the TablePermission type is not strictly needed for this (Permission would suffice), it was easier to simply allow a TablePermission to store nulls for table, cf, and qualifier than make more extensive changes to the AccessController. Doing so could be considered a refinement. This does mean that the AccessControlFilter implementation is changed to pass back SKIP instead of NEXT_COL hints, so every KV is examined by the filter, and use of the filter to wrap Gets and Scans is now unconditional. The performance impact of this has yet to be quantified but is expected not to be that different from changes that require per-KV visibility labels to be evaluated (and could be less expensive since we are not evaluating a mini label language).

No API changes are necessarily required. The new UserTablePermissions type is serialized to PB and then passed as an attribute using the existing API facility for that. Not all mutation types are supported. It doesn't make sense to attach an ACL on a Delete, because the KVs covered by the delete will be ... deleted. Additionally, Increment is not currently supported because it doesn't support attributes (doesn't extend either Mutation nor OperationWithAttributes) and anyway the upsert code paths don't pass KVs through a coprocessor down into HBase core so bundling tags with the user value doesn't work there. Also, while I haven't done this quite yet, I anticipate the upsert code will require a second core modification to preserve any tags that might be in the value to update.

Obviously my goal is to add per-cell ("per-KeyValue") security through as simple and straightforward extensions of the existing AccessController implementation as possible, with minimal to no changes to core code*. I do not aim for Accumulo parity nor support for multi label security schemes. It may be more appropriate to post this work to another JIRA when it is ready for review. This work could be considered orthogonal to any label based approach, and could probably play nicely together with it should one come to exist, as long as both are based on a generic KV tag facility.

* - I am also evaluating an option that avoids making any change to KeyValue, thus no change to core HBase at all. Instead it stores the equivalent of KV tags in "shadow KVs" stored in a shadow metacolumn. Implementation of this is underway but as you can imagine it's not so straightforward to produce something that could perform (nearly) as well as inline in-KV checks. This approach wouldn't have the limitation regarding Increment nor the requirement to change the upsert code.

All of the above is subject to change.
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Jerry Chen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485788#comment-13485788 ] 

Jerry Chen commented on HBASE-6222:
-----------------------------------

[~saint.ack@gmail.com]
bq. A put that broadened visibility would be for the current put only? How would it effect already-put values?
When the visibility is part of the column key, a broadened visibility will not affect the existing columns that with the same {rowid, family, qualifier} but with different visibilities. Thus, the put will only affect the columns that has the same {rowid, family, qualifier, visibility}. Different visibilities has the same effect as different qualifiers.

While as to DeleteFamily or DeleteColumn, Accumulo doesn't have such as operations. It has only Delete mutation to delete a specified {rowid, family, qualifier, visibility}. The idea to keep DeleteFamily and DeleteColumn still working with visibility in HBase is that A DeleteFamily operation now will only affect "all columns in this family with the specified visibility" other than originally "all columns in this family". The same with DeleteColumn.

One thing to consider if the visibility is part of the key. As there are suggestions to provide support for general tags for KV so that not only visibility tags can be stored in it, but also other tags that needed in the future can be added easily. Will general tags concept (comparing to a visibility tag) makes the concept of the column key too complex?




                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6222) Add per-KeyValue Security

Posted by "Himanshu Vashishtha (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-6222:
---------------------------------------

    Attachment: HBaseCellRow-LevelSecurityPRD.docx
                HBaseCellRow-LevelSecurityDesignDoc.docx

[~apurtell] We would like to collaborate on adding this feature. What are the approaches you are considering?
 
We also looked into this feature a while ago, and came up with the attached PRD and design. It will be great to have your feedback on it. 

                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393356#comment-13393356 ] 

Jonathan Hsieh commented on HBASE-6222:
---------------------------------------

@Andy I'm not proposing we necessarily do it the way accumulo does it -- the more important point is that we mimic it's semantics to make it easier for users to eventually port to HBase. :) 

The meta column idea you describe in HBASE-2893 and its use cases seems to also argue for  restoring the distinction between locality groups and column families from original bigtable paper. (these are conflated in HBase).  

Using a meta column for cell-granularity visibility seems like a reasonable hbase implementation option.  I'm not convinced why this approach would save significantly more space.  I do think the not changing the not changing core argument is more compelling constriant.

The accumulo implementation has a default case that would make it cost the tags cost extra only on cells with non-default behavior (e.g. only the highly-sensitive or cols with sensitivity varying cells).  They delta encode their keys (including the visiblity tags) like we do in 0.94.

As a straw man, a core-changing HBase version that modified KV's wouldn't necessarily incur more space -- we'd could a new KeyValue.Type (TaggedPut?) that had visibility settings, and use the old Put type that used the default tags. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494750#comment-13494750 ] 

Andrew Purtell commented on HBASE-6222:
---------------------------------------

[~v.himanshu] Thanks for the gracious comments.

bq. Currently, all the acl entries are stored in zk (limit of znode is 1 mb); will you be using the same approach?

No. Pursuing an incremental approach, so, we keep per-table and per-CF ACLs implemented as-is for now. Then add per cell ACLs as a third level of permissions checking. Per cell ACLs are stored in the region alongside the data. It's an open question if having an ACL table will still be necessary as things evolve.

bq. Its not clear how/where you are storing the acls at keyvalue level.

There are different approaches to doing this, will try two of them to establish the extent of changes to core code required and performance differences:

1. Store per cell ACLs in tags in KV, using the approach to putting tags in KV that I described above. 

2. Store per cell ACLs in a "shadow ACL column". Per cell ACLs are still associated with individual KVs but in effect tag storage is moved external to the data KV, so no changes to the KeyValue implementation are necessary, and stored in the same region. Will incur some additional cost for querying tags but will not require intra cluster RPC.

3. Store per cell ACLs in the ACL table. Tag storage again is external to the data KV, and now in a separate table. Will incur cost for querying tags and furthermore require intra cluster RPC. Probably won't actually try this.

bq. You use acl table, or something else?

The ACL is stored "in" the KV (depends which impl option if that is physical or logical). The table, row, family, qualifier, and timestamp of an in-KV permission are implicit -- those of the KV's location. I do use TablePermission to represent ACLs at the KV level but only to avoid some refactoring in the AccessController that would be outside the scope of a proof of concept. Pardon if mentioning this detail was confusing.

{quote}
bq. It doesn't make sense to attach an ACL on a Delete, because the KVs covered by the delete will be ... deleted.
I don't completely agree with this but will not comment also unless I completely understand your approach.
{quote}

We treat ACLs on a KV as timestamped like the rest of the KV. An ACL in a new Put applies only to that Put. It doesn't change the ACL of any previous Put. However, we require a Put to have covering[1] permission -- so either table perms, CF perms, or perms in the most recent visible existing value, if any, must allow the pending Put (or Append, or Increment, or any mutation) in order for it to be applied. This is because the pending operation will replace the visible value and its ACL, if it has one, for Gets and Scans that don't ask for more than one version. If you actually want to change the ACL on a specific KV, you must Delete then Put, if you have perms for that. On reads, a KV is visible to the InternalScanner if either table perms, CF perms, or perms stored in the KV allow, similar to visibility labels in a sense but with ACL semantics. The AccessControlFilter is extended to do that[2]. For Deletes, _all_ visible prior versions covered by the Delete must allow the Delete, because a major compaction could remove them if we allow the tombstone and it covers them, regardless of any ACL they might contain. "Visible" here is defined as not covered by an already committed tombstone. This allows simple and straightforward evolution of security policy over time without requiring expensive updates[3].

So a Delete doesn't update ACLs. ACLs are bits of information to be stored with the KV. It's meaningless to put an ACL on an op that stores nothing. And, an ACL on a Delete would not grant that Delete permission to do something. It would be the union of per table, per CF, and all ACLs on visible covered values that would grant (or not). Since we are only looking at the most recent version of a KV, if any, except for Deletes, only Delete - and especially DeleteColumn - can be potentially expensive to check. There is no free lunch but union-of-ACL semantics helps: Granting appropriate per table or per CF perms to a user or group obviates the need for KV ACL checking for that user or her groups, so we can early out and avoid going to the store in that case.

[1] Maybe "covering" isn't quite the best term. Suggestions welcome.

[2] This changes AC semantics some. A user who doesn't have permissions to a CF or exact location (column+qualifier) just won't get values back from a scan or get, as opposed to previous behavior which would throw an AccessDeniedException if the user did not have required perms to the CF at the CF level. I refactored TestAccessController to address this in a clean way.

[3] On the other hand GRANT and REVOKE commands could provide a convenient way to change per KV ACLs too, since they are DDL not DML ops. We could consider enhancing the GRANT and REVOKE command processing in the AccessController to replace all per KV ACLs covered by the action, if they exist. A possible optimization is lazy updates via a timestamped log of recent GRANTs and REVOKEs (but I just thought of that now so it's a superficial suggestion).
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494820#comment-13494820 ] 

stack commented on HBASE-6222:
------------------------------

If cp fails to load, will all the data then become visible Andrew?  Anything we can do to have it default to be non-viewable unless via a visibility CP (Otherwise, adding generic tags facility to kv w/ visibility done in a visibility cp sounds sweet)
                
> Add per-KeyValue Security
> -------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>            Assignee: Andrew Purtell
>         Attachments: HBaseCellRow-LevelSecurityDesignDoc.docx, HBaseCellRow-LevelSecurityPRD.docx
>
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6222) Add per-KeyValue Security, get federal funding for HBase?

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393361#comment-13393361 ] 

Lars George commented on HBASE-6222:
------------------------------------

Oh, and that approach would also leave all the comparators etc. untouched. That sounds like a good thing. 
                
> Add per-KeyValue Security, get federal funding for HBase?
> ---------------------------------------------------------
>
>                 Key: HBASE-6222
>                 URL: https://issues.apache.org/jira/browse/HBASE-6222
>             Project: HBase
>          Issue Type: New Feature
>          Components: security
>            Reporter: stack
>
> Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The  Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira