You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/04/19 21:41:49 UTC

[kudu-CR] [authz] updated SentryAuthzProvider caching strategy

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13069


Change subject: [authz] updated SentryAuthzProvider caching strategy
......................................................................

[authz] updated SentryAuthzProvider caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only
one entry per sanitized Sentry's response was created.  With
this patch, a response may be split into two entries: one
contains server- and database-scope privileges, and another
contains table- and column-scope privileges.  Of course, it
also changes the lookup process: now it's necessary to search
for two entries in the cache if looking up for information
related to table-level authz scope.

The new caching strategy leverages the fact that Sentry includes
privileges granted on authorizables of higher scopes in the hierarchy,
if any.  The new strategy is beneficial in cases when a user
has privileges granted on database, and those privileges imply
privileges on tables and columns.  In that case, once there was
a request to authorize an action on one table or a column of that table,
next request to authorize an action on another table or column of
another table will hit the cache, avoiding extra RPC to Sentry.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 203 insertions(+), 35 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 4:

(6 comments)

Another partial review, will do a fuller pass, but posting in case my comment on the commit message is helpful to other reviewers.

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
There are a bunch of details in here, but I think it'd be easier to follow, and in turn, review the rest of this patch, with roughly the following structure:
1. some context explaining what was wrong with the existing caching strategy
2. a concrete example depicting what was bad
3. what is the high-level approach that fixes this problem?
4. how the problem in #2 will work now.
5. other implications/examples, if any seem appropriate.

Here's my attempt based on my understanding of this patch (feel free to correct anything that's off):

The API that Kudu uses to fetch privileges from Sentry will return privileges corresponding to all ancestors and descendants of the requested authorizable in Sentry's authorizable scope hierarchy tree. For instance, if requesting database-level privileges on "s.d", the API will return privileges on "s", "s.d", privileges on all tables within "s.d", and privileges on all columns therein. This is unsettling because a single database may have many tables and columns, and we would currently end up caching all of them, even those that may not be tables and columns stored in Kudu.

Additionally, the existing cache-lookup policy is very simple and doesn't allow the usage of usable privileges that exist in the cache where it could. For instance, the existing policy might store the following in the cache:

 { /s/d/t => ["metadata" on /s, "create" on /s/d, "insert" on /s/d/t, "select" on /s/d/t/c] }

A cache lookup for privileges on /s/d would yield an expensive database-scoped request to Sentry, even though, if we were smarter about how we structured our privileges in cache, we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges into two entries: those for database-scope and above (keyed by database), and those for table-scope and below (keyed by table). For cache lookups for a given authorizable, this allows us to use existing privileges for higher scopes from other authorizables where possible, avoiding expensive calls to Sentry for server- or database-scoped privileges. Now, for the same set of privileges, the cache will store:

 { /s/d => ["metadata" on /s, "create" on /s/d],
   /s/d/t => ["insert" on /s/d/t, "select" on /s/d/t/c] }

With the above in cache, a lookup on /s/d would thus be able to leverage the existing entry. The trade-off here is that lookups on /s/d/t will now require two lookups, one on /s/d and one on /s/d/t, to return the complete lists of privileges in /s/d/t's hierarchy tree.

Why database and above, and table and below? Why not separate them further? Based on our current privilege model, we only expect to request privileges at the database scope (e.g. for CreateTable) or table scope (e.g. OpenTable), so this separation seems natural while limiting the number of cache lookups as much as possible.

As a further optimization on cache size, because there aren't constraints that the privileges in Sentry when requesting privileges at the database scope correspond to things stored in Kudu, we will now ignore privileges below the database scope when making such requests when caching.
(maybe this belongs in a separate patch)


http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@541
PS2, Line 541: HECK(kScopeDabase.Implies(scope));
             :   // Not expecting requests for authorizables of scope narrower than TABLE.
             :   DCHECK(scope.Implies(kScopeTable));
             :   // Do not query Sentry for authz scopes narrower than 'TABLE'.
             :   const auto& key = aggregate_key.GetFlattened
> No, it does not -- those are queries to the cache.
Oof, yeah pretty obvious in retrospect. Rushed code review is bad code review.


http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@610
PS2, Line 610: omes for an authorizable of the TABLE scope.
> OK, so far I used Implies(), but I'm thinking of adding comparison operatio
Yeah, I feel you. Maybe it's worth just being explicit and doing requested_scope == TABLE || requested_scope == COLUMN


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@534
PS4, Line 534: scope(requested_scope);
nit: would you mind reversing the names so that we use requested_scope throughout? That way it's explicitly obvious when reading below what scope we're referring to.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@539
PS4, Line 539: // TODO(aserbin): uncomment when it's guaranteed we are not getting
             :   //                requests for authorizables of scope wider than DATABASE.
I think this is already true; we never request privileges at the server scope.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@545
PS4, Line 545: kScopeTable.scope())
nit: I like these in that it means we don't have to do annoying things like write out SentryAuthorizableScope(SentryAuthorizableScope::TABLE) all the time, and I know it should be obvious, but when pulling out the enum like this, IMO this takes extra cycles to read. Could we keep SentryAuthorizableScope::TABLE here and L617 and L637?



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Apr 2019 07:30:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#6).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(table), where 'table' is 'tableA' or another
name.

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 376 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/9/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/9/src/kudu/master/sentry_authz_provider-test.cc@855
PS9, Line 855: ASSERT_EQ(1, GetTasksSuccessful());
> Is it possible to check the size of the cache entries? If not, just ignore 
It's possible to check the contents of each entry of the cache if you know their keys.  And it's possible to check the overall size of the cache -- there is a cache's metric for that.

I guess you'd like to make sure the cache entry for the 'db' authorizable doesn't contain information on 't0' and 't1' after call to AuthorizeCreateTable("db.table")?  Yep, we can do that.  Let me do that in a separate follow-up patch: this patch has already been around for some time, so more conflicts to resolve with other patches around.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Apr 2019 16:31:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 6: Code-Review+1

(5 comments)

LGTM, just a few nits.

http://gerrit.cloudera.org:8080/#/c/13069/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/5//COMMIT_MSG@87
PS5, Line 87: irrelevant information on non-Kudu
            : tables sent by Sentry with a _database-scope_ response
Might worth a test for this as well?


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc@776
PS6, Line 776: db
nit: still a bit confused here, do you mean 'db' in L780? Because you are doing AuthorizeCreateTable for 'db1' there but the comment is showing otherwise.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@225
PS6, Line 225: GetFlattenedKey
nit: can you add a comment here?


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@234
PS6, Line 234: 2
nit: can you briefly explain why it is 2 now?


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@529
PS6, Line 529: uncomment when it's guaranteed we are not getting
             :   //                requests for authorizables of scope wider than DATABASE.
We don't have any Server level authorizables request today? Or there are some in tests?



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 19:17:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   ASSERT_EQ(3, GetTasksSuccessful());
             : 
             :   // All the requests below should also hit the cache since the information on
             :   // the privileges granted on each of the tables in the requests below
             :   // is in the cache. In the Sentry's privileges model for Kudu, DROP TABLE
             :   // requires privileges on the table itself, but nothing is requred on the
             :   // database the table belongs to.
> I think it would be clearer because using Authorize doesn't presume knowled
Yep, I added comments in attempt to clarify on the policies reflecting the Sentry's privileges model for Kudu.  I think we better keep this as more 'synthetic' test to have stronger guarantees on the behavior of the cache in SentryPrivilegesFetcher.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // while fetching the information privileges on 'db.t0' authorizable of the
             :   // TABLE scope.
             :   for (int idx = 0; idx < 10; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :    
> Same here, I think it would be clearer because using Authorize doesn't pres
Yep, I agree that would be a bit clearer and that make sense.   However, I'd like to keep these tests more 'synthetic' so we are sure the cache behavior is reasonable and beneficial for the actual usage patterns we have in the code.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@250
PS4, Line 250: key_sequence_max_size() method.
> Ah, much clearer. Thanks!
Yep, sure -- I'm working on the patch to get rid of SERVER-scope things scattered in a few places.  Once they are gone and we have clear constraints on what we expect, I'll update this as well.  I'm not sure I want to have it all in this changelist, even if it simplifies this code a bit.


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@252
PS5, Line 252: authorizable
> nit: authorizables
Done


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@611
PS5, Line 611:     if (requested_scope == SentryAuthorizableScope::DATABASE ||
             :         requested_scope == SentryAuthorizableScope::TABLE ||
             :         requested_scope == SentryAuthorizableScope::COLUMN) {
> Once we have the CHECK_NE SERVER, we should be able to get rid of the condi
Right, exactly.  Thank you for the reminder.


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@633
PS5, Line 633:         requested_scope == SentryAuthorizableScope::COLUMN) {
> This shouldn't be possible from L538, right? Same above.
This is still possible in a RELEASE build.  Do you think we want to convert DCHECK at line L538 into a CHECK?  We might allow COLUMN requests with any issues -- the scope is artificially widened into TABLE at L541.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 18:28:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#9).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
The other scenario that benefits from the new caching scheme is a
a sequence of AuthorizeDropTable() followed by AuthorizeCreateTable().

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 424 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/9
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#7).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(table), where 'table' is 'tableA' or another
name.

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 423 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/7
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 4:

(12 comments)

Looks good for the most part; I've got a few nits focused around clarity and making parts of the implementation more explicit through comments.

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
> Thanks for the detailed explanation. Could we further "micro-optimize" by s
I think you're referring to optimizing which branch we copy? I don't think it would make a difference; seems we're copying into an empty branch in a loop.

Also, instead of being a part of the commit message, I wonder if this should instead go somewhere in SentryAuthzFetcher.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   Status s = sentry_authz_provider_->AuthorizeDropTable("db.t0", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.t1", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.other_table", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   ASSERT_EQ(3, GetTasksSuccessful());
nit: I think it would be clearer what this is trying to test if we used Authorize() directly on a TABLE authorizable.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // Same story for requests for 'db1.t0', ..., 'db1.t19'.
             :   for (int idx = 0; idx < 20; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :         table_name, kTestUser, kTestUser));
             :   }
nit: I think it would be clearer what this is trying to test if we used Authorize() directly on a DATABASE authorizable.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h@127
PS4, Line 127:   // Add/merge privileges from other instance of SentryPrivilegesBranch.
It's probably worth documenting that, while we don't explicitly enforce or check for it, we expect that the 'other' branch is logically partitioned from the privileges in this branch. E.g. that this branch has TABLE and COLUMN scoped privileges, and 'other' has DATABASE and SERVER or vice versa.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@240
PS4, Line 240:   // Generate the 'raw' key sequence: a sequence of keys for the whole authz
             :   // scope hierarchy based on the scope hierarchy in 'authorizable'
             :   // for the specified 'user'.
Would be helpful to add an example, especially since this can easily be conflated with GenerateKeySequence().


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@247
PS4, Line 247:  The lookup sequence
             :   // might differ from the 'raw' sequence: it all depends on how the cache
             :   // is populated given sanitized ListPrivileges() Sentry's response.
I'm not sure what this means. I think you're trying to be vague in case this changes in the futuer, but I don't think it's worth it. If we want to change the behavior later, we can just change the comment. So could you just document what the current behavior is? What can a user of this function expect to get back and in what order?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@250
PS4, Line 250: As of now, the cache stores data only under DATABASE- and TABLE-scope keys.
I think this means, "The returned key sequence will only contain DATABASE- and TABLE-scoped keys." If so, just say that. Vagueness helps decouple this from implementation, making it slightly easier to change later, but hurts understandability, which I value more in the context of security-related code.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@255
PS4, Line 255: // Convert Sentry authz scope into index of the raw key sequence.
Reword to indicate a bit more why this is useful:

Convert the Sentry authz scope to an index in the list: { SERVER, DATABASE, TABLE, COLUMN }. This is useful for generating keys comprised of authorizable names at different scopes.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@273
PS4, Line 273:   if (!scope) {
             :     // The flattened key without scope level restrictions
             :     // is the very last element of the key_sequence_.
             :     return raw_key_sequence_.back();
             :   }
Is this used?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@315
PS4, Line 315: 
             : // As of now, 
nit: while this makes this more future proof, it also somewhat clutters comments and distracts from the more important details of what's implemented.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@329
PS4, Line 329:   if (sequence.empty()) {
             :     sequence.emplace_back(raw_sequence.back());
             :   }
This seems to be here in case we request server-level privileges. I don't think it's important?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@629
PS4, Line 629:  From the other side, the information on relevant
             :     // Kudu tables will be fetched as soon as it's requested directly and the
             :     // cache is missing the required entry. In other words, information on
             :     // privileges for authorizables of the SentryAuthorizableScope::TABLE scope
             :     // are fetched from Sentry explicitly.
Reword:

If we need table-level privileges, we should request them explicitly; upon doing so, if none are found in the cache, we will fetch them to Sentry.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Apr 2019 22:18:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#8).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(table), where 'table' is 'tableA' or another
name.

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 422 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/8
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 9: Verified+1

unrelated flake in org.apache.kudu.client.TestKuduScanner


-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Apr 2019 20:15:30 +0000
Gerrit-HasComments: No

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 7: Verified+1

unrelated flake in Java tests: org.apache.kudu.client.TestTimeouts


-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 03:16:38 +0000
Gerrit-HasComments: No

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#4).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains SERVER- and
DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the TABLE scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on DATABASE.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(tableA).

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 319 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] updated SentryAuthzProvider caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] updated SentryAuthzProvider caching strategy
......................................................................


Patch Set 2:

(3 comments)

I need to head out now, but jotted down some notes.

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@541
PS2, Line 541: auto handle = cache_->Get(e);
             :       if (!handle) {
             :         continue;
             :       }
             :       handles.emplace_back(std::move(handle));
We're now doing Get() in a loop. Does this mean that we might go to Sentry multiple times?


http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@548
PS2, Line 548:   if (!handles.empty()) {
             :     SentryPrivilegesBranch result;
             :     for (const auto& e : handles) {
             :       DCHECK(e);
             :       result.Merge(e.value());
             :     }
             :     *privileges = std::move(result);
             :     return Status::OK();
             :   }
> I suspect there might be a bug here.  Imagine that information on privilege
Yeah this is a bug:

- the user has privileges on "db" and we have METADATA ON DATABASE
- our privileges for "db.table" expired, but the user _should_ have SELECT ON TABLE
- we go through these checks and end up with only METADATA ON DATABASE when we should have really gone to Sentry

This means that if _any_ of the Get()s don't return a handle, we need to go to Sentry.


http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@610
PS2, Line 610: requested_scope >= SentryAuthorizableScope::TABLE
This seems to be tied to the implementation/order of the enums instead of the logical scopes themselves. `!SentryAuthorizableScope(requested_scop).Implies(DATABASE)` would be clearer.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Apr 2019 01:03:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h@126
PS6, Line 126: 
             :   // Add/merge privileges from other instance of SentryPrivilegesBranch.
             :   void Merge(const SentryPrivilegesBranch& other);
             : 
             :   // Output the privileges into branches corresponding to DB-and-higher and
             :   // TABLE-and-lower authz scopes.
             :   void Split(SentryPrivilegesBranch* other_scope_db,
             :              SentryPrivilegesBranch* other_scope_table) const;
> Yep, I thought about that already.
Ah, I missed that in both cases we're operating on cached objects. Agreed that it doesn't make sense.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@422
PS6, Line 422:     switch (e.scope) {
> I might be missing something here: I'm not sure that's what we want here: e
Mike and I talked about emplace_back (without std::move) vs. push_back in another code review: https://gerrit.cloudera.org/c/12205/3/src/kudu/common/generic_iterators.cc#441

tl;dr: it's easier to remember to always use emplace_back than to know to use push_back when copying and emplace_back when moving. Mike also wrote a microbenchmark that showed that emplace_back was faster.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 03:37:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 8: Verified+1

unrelated flake in ClientTest.TestServerTooBusyRetry


-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 05:09:22 +0000
Gerrit-HasComments: No

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 7: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc@534
PS7, Line 534: t expected to receive any requests
nit: "receive any requests" makes it sound like this is a service and that the requests are external. Maybe reword this as, "SentryPrivilegesFetcher is not expected to request authorizables of the SERVER scope unless this method is called from test code."

Also, following our in-person chat about introducing a flag to gate this, I realized we actually already have a flag for checking if we're in a test: IsGTest()
We using sparingly though because it's kind of ugly and doesn't add a whole lot as test coverage goes IMO


http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc@538
PS7, Line 538: received request to fetch privileges of the SERVER scope "
             :         "on authorizable '$0' for user '$1'", table_name, user);
nit: same here re "received request", maybe reword as "requesting to fetch privileges of the SERVER scope on..."



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 03:16:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h@126
PS6, Line 126: 
             :   // Add/merge privileges from other instance of SentryPrivilegesBranch.
             :   void Merge(const SentryPrivilegesBranch& other);
             : 
             :   // Output the privileges into branches corresponding to DB-and-higher and
             :   // TABLE-and-lower authz scopes.
             :   void Split(SentryPrivilegesBranch* other_scope_db,
             :              SentryPrivilegesBranch* other_scope_table) const;
> Ah, I missed that in both cases we're operating on cached objects. Agreed t
Ack


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@422
PS6, Line 422:     switch (e.scope) {
> Mike and I talked about emplace_back (without std::move) vs. push_back in a
Done


http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc@534
PS7, Line 534: t expected to receive any requests
> nit: "receive any requests" makes it sound like this is a service and that 
Done.

Nice find about IsGTest().  I'm not sure using it here brings a lot of value, since that's kind of a reverse case: it would make sense to use it if there were no call with the SERVER scope in tests, so we could  safely crash when spotting 'a programming error' in that imaginary case.


http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc@538
PS7, Line 538: received request to fetch privileges of the SERVER scope "
             :         "on authorizable '$0' for user '$1'", table_name, user);
> nit: same here re "received request", maybe reword as "requesting to fetch 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 04:16:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@610
PS2, Line 610: requested_scope >= SentryAuthorizableScope::TABLE
> Yeah, but it makes sense to keep in that order that reflect the hierarchy. 
OK, so far I used Implies(), but I'm thinking of adding comparison operations for the scope: wider or narrower semantics.  At least, that way it will have easier to understand semantics and will be easier to read, IMO.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Apr 2019 05:39:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   Status s = sentry_authz_provider_->AuthorizeDropTable("db.t0", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.t1", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.other_table", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   ASSERT_EQ(3, GetTasksSuccessful());
> I'm not sure I understand what would be clearer since those methods use TAB
Ah, probably it's implied that AuthorizeDropTable() might include checking privileges for a DATABASE-scope object in addition to that checking table-level privileges?

But then it would make sense to replace AuthorizeAlterTable() with Authorize() on a TABLE-scope object.

From the other side, even if database-level privileges were involved in those AuthorizeXxxYyy() methods, these assertions should still stay valid.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@747
PS4, Line 747:   ASSERT_OK(AlterRoleGrantPrivilege(GetDatabasePrivilege("db0", "ALTER")));
             :   ASSERT_OK(AlterRoleGrantPrivilege(GetDatabasePrivilege("db1", "CREATE")));
> Can you add a test that have table level privilege on a table but not datab
Master will not issue another RPC to Sentry in that case, but of course we want to have that path covered as well.  I'll add the test, sure.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@763
PS4, Line 763: db'
> Not following why mention 'db'  here as we are requesting for 'db1.tablex'?
Because AutorizeCreateTable() checks for DATABASE-scope privileges only, not TABLE-scope privileges.  I'll add a comment clarifying on that.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // Same story for requests for 'db1.t0', ..., 'db1.t19'.
             :   for (int idx = 0; idx < 20; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :         table_name, kTestUser, kTestUser));
             :   }
> I'm not sure I understand. The Authorize() method is a private method of th
Ah, I guess that's about 'possible' verification of table-level privileges withing AuthorizeCreateTable().

I think I'll add a comment about what AuthorizeCreateTable() is does/expected to do here to clarify a bit on what's is going on.  I'm hesitant to change to 'atomic' Authorize() because I think that using synthetic methods as 'AuthorizeXxxYyy()' for verification gives us even stronger guarantees on the cache behavior from one side, and keeps these tests closer to the real use-cases from the other.  These are not unit tests, anyways -- too many other components are involved in 'simple' Authorize() call.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 07:44:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13069/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/5//COMMIT_MSG@87
PS5, Line 87: irrelevant information on non-Kudu
            : tables sent by Sentry with a _database-scope_ response
> Might worth a test for this as well?
That's a good idea.  I added the test.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc@708
PS6, Line 708: requried
> nit: required
Done


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc@734
PS6, Line 734: requred
> nit: required
Done


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc@776
PS6, Line 776: db
> nit: still a bit confused here, do you mean 'db' in L780? Because you are d
Fixed, thanks for pointing that out.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h@126
PS6, Line 126: 
             :   // Add/merge privileges from other instance of SentryPrivilegesBranch.
             :   void Merge(const SentryPrivilegesBranch& other);
             : 
             :   // Output the privileges into branches corresponding to DB-and-higher and
             :   // TABLE-and-lower authz scopes.
             :   void Split(SentryPrivilegesBranch* other_scope_db,
             :              SentryPrivilegesBranch* other_scope_table) const;
> What do you think about making these operations destructive? Meaning:
Yep, I thought about that already.
  * making Split() destructive doesn't seem a good alternative to me: it would change the shared response received from Sentry that might be used by other concurrent threads
  * making Merge destructive doesn't look like a good idea since it works with cache items, and moving privileges from 'other' means messing up the information stored in the cache.

As an alternative, it would make sense to store the same entry in the cache for both TABLE and DATABASE (if received as the response for the TABLE-scope authorizable), but since TTLCache doesn't have any extra-wrapper (e.g., like FileCache), that's not the case: the entry ownership for the underlying Cache must follow the 'unique' pattern.

Yes, at this point we do a lot of copying here, but I'm not sure this is a crucial point to address right now.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@225
PS6, Line 225: GetFlattenedKey
> nit: can you add a comment here?
Done


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@233
PS6, Line 233:   constexpr static size_t key_sequence_max_size() {
> I think this would be more clear if defined as a constant rather than hidde
Done


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@234
PS6, Line 234: 2
> nit: can you briefly explain why it is 2 now?
Done


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@422
PS6, Line 422:         scope_db.privileges_.push_back(e);
> Let's use emplace_back() here and on L426 for new code.
I might be missing something here: I'm not sure that's what we want here: emplace_back() with std::move() is not an option (see my response to your comment w.r.t. moving instead of copying in Merge() and Split() methods); and other cases would require specifying parameters for the constructors otherwise.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@529
PS6, Line 529: uncomment when it's guaranteed we are not getting
             :   //                requests for authorizables of scope wider than DATABASE.
> We don't have any Server level authorizables request today? Or there are so
We do.  Yes, those calls are from tests.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 02:47:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@541
PS2, Line 541: auto handle = cache_->Get(e);
             :       if (!handle) {
             :         continue;
             :       }
             :       handles.emplace_back(std::move(handle));
> We're now doing Get() in a loop. Does this mean that we might go to Sentry 
No, it does not -- those are queries to the cache.


http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@548
PS2, Line 548:   if (!handles.empty()) {
             :     SentryPrivilegesBranch result;
             :     for (const auto& e : handles) {
             :       DCHECK(e);
             :       result.Merge(e.value());
             :     }
             :     *privileges = std::move(result);
             :     return Status::OK();
             :   }
> Yeah this is a bug:
This is addressed now.


http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@610
PS2, Line 610: requested_scope >= SentryAuthorizableScope::TABLE
> This seems to be tied to the implementation/order of the enums instead of t
Yeah, but it makes sense to keep in that order that reflect the hierarchy.  I'm not sure Implies() is clearer, at least for me Implies() is much more obscure.  IMO, in the context of PrivilegeFetcher it's easier to reason about hierarchy in terms of level.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Apr 2019 02:19:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc@538
PS7, Line 538: on authorizable '$0' for user '$1'", table_name, user);
             :   }
> Done
From the other side, if thinking about running this code in kudu-master process as a part of external minicluster, spotting programming errors in DEBUG builds makes sense.  I'll post an update in a moment.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 17:53:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#3).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains SERVER- and
DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the TABLE scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on DATABASE.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(tableA).

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 305 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#5).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(table), where 'table' is 'tableA' or another
table.

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 324 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] updated SentryAuthzProvider caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#2).

Change subject: [authz] updated SentryAuthzProvider caching strategy
......................................................................

[authz] updated SentryAuthzProvider caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only
one entry per sanitized Sentry's response was created.  With
this patch, a response may be split into two entries: one
contains server- and database-scope privileges, and another
contains table- and column-scope privileges.  Of course, it
also changes the lookup process: now it's necessary to search
for two entries in the cache if looking up for information
related to table-level authz scope.

The new caching strategy leverages the fact that Sentry includes
privileges granted on authorizables of higher scopes in the hierarchy,
if any.  The new strategy is beneficial in cases when a user
has privileges granted on database, and those privileges imply
privileges on tables and columns.  In that case, once there was
a request to authorize an action on one table or a column of that table,
next request to authorize an action on another table or column of
another table will hit the cache, avoiding extra RPC to Sentry.
Another example that benefits from the new caching scheme is
AuthorizeDropTable(tableA) followed by AuthorizeCreateTable(tableA).

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 208 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 8: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Apr 2019 04:41:25 +0000
Gerrit-HasComments: No

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@633
PS5, Line 633:         requested_scope == SentryAuthorizableScope::COLUMN) {
> This is still possible in a RELEASE build.  Do you think we want to convert
I meant '... we can allow COLUMN requests without any issues -- the scope is artificially widened to TABLE at L541 ...'



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 19:17:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
The other scenario that benefits from the new caching scheme is a
a sequence of AuthorizeDropTable() followed by AuthorizeCreateTable().

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Reviewed-on: http://gerrit.cloudera.org:8080/13069
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Hao Hao <ha...@cloudera.com>
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 424 insertions(+), 60 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Andrew Wong: Looks good to me, approved
  Hao Hao: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>

[kudu-CR] [authz] updated SentryAuthzProvider caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] updated SentryAuthzProvider caching strategy
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/2/src/kudu/master/sentry_privileges_fetcher.cc@548
PS2, Line 548:   if (!handles.empty()) {
             :     SentryPrivilegesBranch result;
             :     for (const auto& e : handles) {
             :       DCHECK(e);
             :       result.Merge(e.value());
             :     }
             :     *privileges = std::move(result);
             :     return Status::OK();
             :   }
I suspect there might be a bug here.  Imagine that information on privileges on db.tableA has been fetched before, and at the db scope there is just metadata-related privileges, while at the table scope level there SELECT privilege granted on db.tableA.  Now, if there is a request to fetch privileges for db.tableB, on which the user is given SELECT privilege as well.  The information is not yet in the cache, and upon such a request this code will return privileges just for the db scope, so AuthzProvider will incorrectly deny SELECT on db.tableB.

Essentially, that means 1) it's necessary to fetch privileges on the requested table scope if such a record the record was not found 2) it's necessary to add corresponding entries in the cache even if the list of privilege records for database scope is empty.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 19 Apr 2019 22:39:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 9: Code-Review+2

(1 comment)

LGTM, probably good to get a signoff from Hao re the tests she requested.

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/7/src/kudu/master/sentry_privileges_fetcher.cc@538
PS7, Line 538: FATAL) << Substitute(
             :         "requesting privileges of the SERVER scope from Sentry "
> From the other side, if thinking about running this code in kudu-master pro
Ah, great point! SGTM.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Apr 2019 00:45:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 5: Code-Review+1

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains server- and
           : database-scope privileges, and another contains table- and column-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the table scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on database.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(table), 
> I'm not sure the switch of the lookup order from {'/s/d', '/s/d/t'} to {'/s
Yeah, we had a few discussions about this with a good many possible ways to optimize the cache, and among all of them, it was tough for me to remember where we landed. That aside, the rationale and details should clear in a commit message or pointed at in a code comment for posterity anyway.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   Status s = sentry_authz_provider_->AuthorizeDropTable("db.t0", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.t1", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.other_table", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   ASSERT_EQ(3, GetTasksSuccessful());
> I'm not sure I understand what would be clearer since those methods use TAB
I think it would be clearer because using Authorize doesn't presume knowledge of the privilege-checking policy for DDL, ie the fact that DropTable requires TABLE privileges. That said, I don't feel too strongly about it.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // Same story for requests for 'db1.t0', ..., 'db1.t19'.
             :   for (int idx = 0; idx < 20; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :         table_name, kTestUser, kTestUser));
             :   }
> I'm not sure I understand. The Authorize() method is a private method of th
Same here, I think it would be clearer because using Authorize doesn't presume knowledge of the privilege-checking policy for DDL, ie the fact that CreateTable requires DATABASE privileges. That said, I don't feel too strongly about it.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@250
PS4, Line 250: key_sequence_max_size() method.
> This method can return a SERVER-scope key as well if the input for the cons
Ah, much clearer. Thanks!

Eventually we wouldn't expect SERVER-scoped keys though, right?


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@252
PS5, Line 252: autorizables
nit: authorizables


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@611
PS5, Line 611:     if (requested_scope == SentryAuthorizableScope::DATABASE ||
             :         requested_scope == SentryAuthorizableScope::TABLE ||
             :         requested_scope == SentryAuthorizableScope::COLUMN) {
Once we have the CHECK_NE SERVER, we should be able to get rid of the conditional here entirely, right? (not actionable right now, just checking)


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@633
PS5, Line 633:         requested_scope == SentryAuthorizableScope::COLUMN) {
This shouldn't be possible from L538, right? Same above.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 08:01:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
> There are a bunch of details in here, but I think it'd be easier to follow,
Thanks for the detailed explanation. Could we further "micro-optimize" by switching the lookup order so we try /s/d/t before /s/d? The rationale here is that CreateTable is rare, and OpenTable is common (and "herdful"), so let's choose the order that'll yield better performance more often.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 21 Apr 2019 20:46:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 8
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.h@126
PS6, Line 126: 
             :   // Add/merge privileges from other instance of SentryPrivilegesBranch.
             :   void Merge(const SentryPrivilegesBranch& other);
             : 
             :   // Output the privileges into branches corresponding to DB-and-higher and
             :   // TABLE-and-lower authz scopes.
             :   void Split(SentryPrivilegesBranch* other_scope_db,
             :              SentryPrivilegesBranch* other_scope_table) const;
What do you think about making these operations destructive? Meaning:
- Merge should move privileges from 'other' rather than copying them.
- Split should move privileges to 'other_scope_{db,table}' rather than copying them.

AFAICT the use cases for Merge and Split would support this, and it'd be more performant.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@233
PS6, Line 233:   constexpr static size_t key_sequence_max_size() {
I think this would be more clear if defined as a constant rather than hidden behind a function, even if the constexpr means the compiler is smart enough to treat it as a constant.


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_privileges_fetcher.cc@422
PS6, Line 422:         scope_db.privileges_.push_back(e);
Let's use emplace_back() here and on L426 for new code.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 23:00:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 6: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   ASSERT_EQ(3, GetTasksSuccessful());
             : 
             :   // All the requests below should also hit the cache since the information on
             :   // the privileges granted on each of the tables in the requests below
             :   // is in the cache. In the Sentry's privileges model for Kudu, DROP TABLE
             :   // requires privileges on the table itself, but nothing is requred on the
             :   // database the table belongs to.
> Yep, I added comments in attempt to clarify on the policies reflecting the 
SGTM, thanks for clarifying!


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // while fetching the information privileges on 'db.t0' authorizable of the
             :   // TABLE scope.
             :   for (int idx = 0; idx < 10; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :    
> Yep, I agree that would be a bit clearer and that make sense.   However, I'
Ack


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc@708
PS6, Line 708: requried
nit: required


http://gerrit.cloudera.org:8080/#/c/13069/6/src/kudu/master/sentry_authz_provider-test.cc@734
PS6, Line 734: requred
nit: required


http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/5/src/kudu/master/sentry_privileges_fetcher.cc@633
PS5, Line 633:         requested_scope == SentryAuthorizableScope::COLUMN) {
> This is still possible in a RELEASE build.  Do you think we want to convert
Possible or not in release build, I think it is unexpected to request column-level privileges, and whether it's DCHECK or CHECK, it would be caught in pre-commit regardless. I.e. we have no codepaths currently, nor do we ever expect to have codepaths requesting column-level privileges. I suppose it's somewhat harmless, but I think it's a nice guardrail.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 19:16:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@747
PS4, Line 747:   ASSERT_OK(AlterRoleGrantPrivilege(GetDatabasePrivilege("db0", "ALTER")));
             :   ASSERT_OK(AlterRoleGrantPrivilege(GetDatabasePrivilege("db1", "CREATE")));
Can you add a test that have table level privilege on a table but not database privilege on the database, the table belongs to. In that case, after authorize an operation on the table, will master issue another RPC to sentry when authorize an operation on the database?


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@763
PS4, Line 763: db'
Not following why mention 'db'  here as we are requesting for 'db1.tablex'?



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 06:44:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
> There are a bunch of details in here, but I think it'd be easier to follow,
Thanks a lot for the write-up!  I assumed those details are known/obvious to the parties involved, but once you wrote that up, I think I'll copy-paste with a bit of editing and put into the commit message.


http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
> Thanks for the detailed explanation. Could we further "micro-optimize" by s
I'm not sure the switch of the lookup order from {'/s/d', '/s/d/t'} to {'/s/d/t', '/s/d'} gives any optimization.  The scheme assumes it's necessary to find both '/s/d' and '/s/d/t' entries in the cache anyways if looking up for the information on a table-scope authorizable.  But I might be missing something here.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@729
PS4, Line 729:   Status s = sentry_authz_provider_->AuthorizeDropTable("db.t0", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.t1", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   s = sentry_authz_provider_->AuthorizeDropTable("db.other_table", kTestUser);
             :   ASSERT_TRUE(s.IsNotAuthorized()) << s.ToString();
             :   ASSERT_EQ(3, GetTasksSuccessful());
> nit: I think it would be clearer what this is trying to test if we used Aut
I'm not sure I understand what would be clearer since those methods use TABLE authorizables already: 'db.t0', 'db.t1', and 'db.other_table'.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_authz_provider-test.cc@776
PS4, Line 776:   // Same story for requests for 'db1.t0', ..., 'db1.t19'.
             :   for (int idx = 0; idx < 20; ++idx) {
             :     const auto table_name = Substitute("db1.t$0", idx);
             :     ASSERT_OK(sentry_authz_provider_->AuthorizeCreateTable(
             :         table_name, kTestUser, kTestUser));
             :   }
> nit: I think it would be clearer what this is trying to test if we used Aut
I'm not sure I understand. The Authorize() method is a private method of the AuthzProvider's interface.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h
File src/kudu/master/sentry_privileges_fetcher.h:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.h@127
PS4, Line 127:   // Add/merge privileges from other instance of SentryPrivilegesBranch.
> It's probably worth documenting that, while we don't explicitly enforce or 
I think that doesn't matter since the method doesn't expect any constraints in that regard at all.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc
File src/kudu/master/sentry_privileges_fetcher.cc:

http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@240
PS4, Line 240:   // Generate the 'raw' key sequence: a sequence of keys for the whole authz
             :   // scope hierarchy based on the scope hierarchy in 'authorizable'
             :   // for the specified 'user'.
> Would be helpful to add an example, especially since this can easily be con
Done


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@247
PS4, Line 247:  The lookup sequence
             :   // might differ from the 'raw' sequence: it all depends on how the cache
             :   // is populated given sanitized ListPrivileges() Sentry's response.
> I'm not sure what this means. I think you're trying to be vague in case thi
I'm not trying to be vague.  However, I think it's better to drop this comment to avoid misunderstanding if it seems vague, especially given the fact that this is a private method of a private sub-class.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@250
PS4, Line 250: As of now, the cache stores data only under DATABASE- and TABLE-scope keys.
> I think this means, "The returned key sequence will only contain DATABASE- 
This method can return a SERVER-scope key as well if the input for the constructor was a SERVER-level authorizable.

The idea behind the comment was to give the reader more context on this method.

The beginning of the comment for this methods says 'Generate the cache key lookup sequence'.  Then comes: 'the cache stores data only under DATABASE- and TABLE-scope keys'.  I'm not sure what do you mean by 'vague' here.

I added the description of what this method returns for every valid input.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@255
PS4, Line 255: // Convert Sentry authz scope into index of the raw key sequence.
> Reword to indicate a bit more why this is useful:
Done, but I don't think the second sentence brings any clarity or makes it easy to understand.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@273
PS4, Line 273:   if (!scope) {
             :     // The flattened key without scope level restrictions
             :     // is the very last element of the key_sequence_.
             :     return raw_key_sequence_.back();
             :   }
> Is this used?
That was the former contract of this method.  Right now it's not used, so I updated the signature of the method and the implementation.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@315
PS4, Line 315: 
             : // As of now, 
> nit: while this makes this more future proof, it also somewhat clutters com
Indeed -- the declaration has all necessary details already.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@329
PS4, Line 329:   if (sequence.empty()) {
             :     sequence.emplace_back(raw_sequence.back());
             :   }
> This seems to be here in case we request server-level privileges. I don't t
That happens in our current tests.  So, I think we need to keep it here until we remove those tests and add CHECK() precondition everywhere else.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@534
PS4, Line 534: scope(requested_scope);
> nit: would you mind reversing the names so that we use requested_scope thro
I got rid of the 'scope' variable: now the code uses 'requested_scope' as is.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@539
PS4, Line 539: // TODO(aserbin): uncomment when it's guaranteed we are not getting
             :   //                requests for authorizables of scope wider than DATABASE.
> I think this is already true; we never request privileges at the server sco
That's not yet true, at least for some of our tests.  This assertion triggers while running current set of tests.  I think I'll address that in a separate changelist, removing that test and uncommenting this assertion.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@545
PS4, Line 545: kScopeTable.scope())
> nit: I like these in that it means we don't have to do annoying things like
All right, it seems we are going back and forth from using Implies() and objects to counting cycles and bits.  I think we can settle with enums, indeed.


http://gerrit.cloudera.org:8080/#/c/13069/4/src/kudu/master/sentry_privileges_fetcher.cc@629
PS4, Line 629:  From the other side, the information on relevant
             :     // Kudu tables will be fetched as soon as it's requested directly and the
             :     // cache is missing the required entry. In other words, information on
             :     // privileges for authorizables of the SentryAuthorizableScope::TABLE scope
             :     // are fetched from Sentry explicitly.
> Reword:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Apr 2019 06:38:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 9: Code-Review+2

(1 comment)

LGTM, just one minor nit.

http://gerrit.cloudera.org:8080/#/c/13069/9/src/kudu/master/sentry_authz_provider-test.cc
File src/kudu/master/sentry_authz_provider-test.cc:

http://gerrit.cloudera.org:8080/#/c/13069/9/src/kudu/master/sentry_authz_provider-test.cc@855
PS9, Line 855: ASSERT_EQ(1, GetTasksSuccessful());
Is it possible to check the size of the cache entries? If not, just ignore it.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Apr 2019 06:42:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13069/4//COMMIT_MSG@8
PS4, Line 8: 
           : This patch updates the way how the privilege cache in
           : SentryAuthzProvider is populated.  Prior to this patch, only one entry
           : per sanitized Sentry's response was created.  With this patch,
           : a response may be split into two entries: one contains SERVER- and
           : DATABASE-scope privileges, and another contains TABLE- and COLUMN-scope
           : privileges.  Of course, it also changes the lookup process: now it's
           : necessary to search for two entries in the cache if looking up for
           : an entry with privileges for an authorizable of the TABLE scope.
           : 
           : The new caching strategy leverages the fact that Sentry includes
           : information on privileges granted on authorizables of higher scopes
           : in the hierarchy, if any.  The new strategy is beneficial in cases
           : when a user has privileges granted on DATABASE.  In that case, once
           : there was a request to authorize an action on a table or a column
           : of that table, next request to authorize an action on the database
           : itself will hit the cache, avoiding an extra RPC sent to Sentry.
           : Another example that benefits from the new caching scheme are
           : scenarios like AuthorizeDropTable(tableA) followed by
           : AuthorizeCreateTable(tableA).
> I think you're referring to optimizing which branch we copy? I don't think 
Yeah I misunderstood: I assumed that we'd "early out" if we got a hit in the first of two lookups. While desirable for "simple lookups" (i.e. am I allowed to create this table?) it's not actually correct for OpenTable, where we want to collect all privileges that a user might have on a table. Moreover, even for simple lookups, which of /s/d/t or /s/d succeed has nothing to do with the order in which they're executed, but everything to do with the underlying privileges in Sentry's database.

Sorry for the noise.



-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Apr 2019 22:33:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [authz] new SentryAuthzProvider's caching strategy

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/13069 )

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>