You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Na Li <li...@cloudera.com> on 2018/01/04 05:28:09 UTC

Re: Issue with SimpleCacheProviderBackend

Colm,

I tried to reproduce your issue using sentry 2.0 (master branch) with Hive
2.3.2.

The test code is

  @Test
  public void testPositiveOnAll() throws Exception {
    Connection connection = context.createConnection(ADMIN1);
    Statement statement = context.createStatement(connection);
    statement.execute("CREATE database " + DB1);
    statement.execute("use " + DB1);
    statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
    statement.execute("CREATE ROLE user_role1");
    statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*");
    statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
    statement.close();
    connection.close();

    connection = context.createConnection(USER1_1);
    statement = context.createStatement(connection);
    statement.execute("use " + DB1);
    statement.execute("*SELECT * FROM t1*");

    statement.close();
    connection.close();
  }


required privileges:

   - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
   - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select


cached privilege:

   - server=server1->db=db_1->table=t1->action=select

So the authorization works.

Note

   - For me, the "*SELECT * FROM t1*" causes the required privileges to
   contain each column explicitly. However, for you, The "privilege" to check
   looks like:
   Server=server1->Db=authz->Table=words->action=select; The columns are
   not explicitly listed. Hive controls if the column is included in
   required privilege.
   At org.apache.sentry.binding.hive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings
   ->  getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
   accessedColumns from Hive input to add colHierarchy for each column. You
   can check if accessedColumns is empty or null for the hive version you
   are using.
   - For me, the cached privilege does not include column part. For you,
   the cached privilege is "Server=server1->Db=authz->Table=words->
   *Column=**->action=select". *Can you share your test code*, so I can see
   how you grant the privilege and therefore the cached privilege contains
   column?
      - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE user_role1",
      and got following error
      -
      - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
      [WARN -
      org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:539)]
      Error executing statement:
      - org.apache.hive.service.cli.HiveSQLException: Error while compiling
      statement: FAILED: ParseException line 1:6 cannot recognize input near
      'GRANT' 'SELECT' '(' in ddl statement
      - at
      org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
      - at
      org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
      - at
      org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
      - at
      org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
      - at
      org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)

Thanks,

Lina

On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <co...@apache.org>
wrote:

> Thanks Kalyan! I was thinking that if the cached privilege part does not
> appear in the requested "part", and if is "all", then we should skip that
> part and continue on to the next one. But maybe there is a better solution.
>
> Colm.
>
> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
> kkalyan@cloudera.com> wrote:
>
> > Colm,
> >
> > I will look closer into this today and see If i can help you out.
> >
> > -Kalyan
> >
> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
> coheigea@apache.org>
> > wrote:
> >
> >> Hi,
> >>
> >> I've done some further analysis of the problem, and I think it is not
> >> directly related to SENTRY-1291. The problem manifests in
> >> CommonPrivilege.implies(privilege, model). My (cached) privilege looks
> >> like:
> >>
> >> Server=server1->Db=authz->Table=words->Column=*->action=select
> >>
> >> The "privilege" I want to check looks like:
> >>
> >> Server=server1->Db=authz->Table=words->action=select;
> >>
> >> The problem is in the "for" loop in CommonPrivilege.implies. It loops on
> >> the parts of the second privilege, and matches up to "action=select".
> Here
> >> it tries to compare to "Column=*" of the cached privilege and fails on
> >> this
> >> line:
> >>
> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
> >> src/main/java/org/apache/sentry/policy/common/CommonPrivilege.java#L86
> >>
> >> It's clear there's a bug here somewhere, but I'm not sure where - can
> >> someone please advise?
> >>
> >> Thanks,
> >>
> >> Colm.
> >>
> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com> wrote:
> >>
> >> > Sasha,
> >> >
> >> > sentry-1291 is helpful for the problem that sentry privilege checks
> >> takes
> >> > too long with many explicit grants, which is useful for big customers.
> >> > Another approach that can improve the performance is to organize the
> >> > privileges according to the authorization hierarchy in a tree
> >> structure, so
> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is in
> the
> >> > order of log(N), not linear of N, where N is the number of privileges.
> >> >
> >> > We can wait for Colm to confirm his issue is caused by sentry-1291. If
> >> so,
> >> > it may be fixed by selecting privileges by finding if the requesting
> >> > authorization object is prefix of cached privileges instead of exact
> >> match.
> >> >
> >> > in SimplePrivilegeCache
> >> >
> >> > public Set<String> listPrivileges(Set<String> groups, Set<String>
> users,
> >> > ActiveRoleSet roleSet,
> >> >       Authorizable... authorizationHierarchy) {
> >> >     Set<String> privileges = new HashSet<>();
> >> >     Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier
> >> archy);
> >> >     for (StringBuilder authzKey : authzKeys) {
> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) != null) {
> >> >   <-
> >> > instead of exact matching, add extension function to check if
> >> > authzKey.toString is the prefix of the key of the entries
> >> > in cachedAuthzPrivileges.
> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
> >> toString()));
> >> >       }
> >> >     }
> >> >
> >> >     return privileges;
> >> >   }
> >> >
> >> > Thanks,
> >> >
> >> > Lina
> >> >
> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
> akolb@cloudera.com
> >> >
> >> > wrote:
> >> >
> >> > > I think that SENTRY-1291 should be just reverted - there are
> multiple
> >> > > issues with it and no one is actually using the fix. Anyone wants to
> >> do
> >> > it?
> >> > >
> >> > > - Alex
> >> > >
> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <li...@cloudera.com>
> wrote:
> >> > >
> >> > > > Colm,
> >> > > >
> >> > > > Glad you find the cause!
> >> > > >
> >> > > > You can revert Sentry-1291, and see if it works. If so, it is
> issue
> >> at
> >> > > > finding cached privileges.
> >> > > >
> >> > > > Cheers,
> >> > > >
> >> > > > Lina
> >> > > >
> >> > > > Sent from my iPhone
> >> > > >
> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
> >> > coheigea@apache.org>
> >> > > > wrote:
> >> > > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I can see what the problem is (that the authorization hierarchy
> >> does
> >> > > not
> >> > > > > contain the column, and hence doesn't match against the cached
> >> > > > privilege),
> >> > > > > but I'm not sure about the best way to solve it. Either the way
> we
> >> > are
> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing the
> >> cached
> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
> >> > CommonPrivilege).
> >> > > > >
> >> > > > > Colm.
> >> > > > >
> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <li...@cloudera.com>
> >> > wrote:
> >> > > > >>
> >> > > > >> Colm,
> >> > > > >>
> >> > > > >> I did not get chance to look into this issue today. Sorry about
> >> > that.
> >> > > > >>
> >> > > > >> You can add a e2e test case and set break point at where the
> >> > > > authorization
> >> > > > >> object hierarchy to a list of authorization objects, which is
> >> used
> >> > to
> >> > > do
> >> > > > >> exact match with cache
> >> > > > >>
> >> > > > >> Sent from my iPhone
> >> > > > >>
> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
> >> > > coheigea@apache.org
> >> > > > >
> >> > > > >> wrote:
> >> > > > >>>
> >> > > > >>> That would be great, thanks!
> >> > > > >>>
> >> > > > >>> Colm.
> >> > > > >>>
> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <lina.li@cloudera.com
> >
> >> > > wrote:
> >> > > > >>>>
> >> > > > >>>> Colm,
> >> > > > >>>>
> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look later
> >> > today.
> >> > > > >>>>
> >> > > > >>>> Thanks,
> >> > > > >>>>
> >> > > > >>>> Lina
> >> > > > >>>>
> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
> >> > > > >> coheigea@apache.org>
> >> > > > >>>> wrote:
> >> > > > >>>>
> >> > > > >>>>> Hi all,
> >> > > > >>>>>
> >> > > > >>>>> I've updated some local testcases to work with Sentry 2.0.0
> >> and
> >> > the
> >> > > > >> "v1"
> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and the
> "v2"
> >> > > > >> binding).
> >> > > > >>>>>
> >> > > > >>>>> I have a simple table called "words" (word STRING, count
> >> INT). I
> >> > am
> >> > > > >>>> making
> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM words
> where
> >> > > count
> >> > > > ==
> >> > > > >>>>> '100'".
> >> > > > >>>>>
> >> > > > >>>>> "bob" is in the "manager" group", which has the following
> >> role:
> >> > > > >>>>>
> >> > > > >>>>> select_all_role =
> >> > > > >>>>> Server=server1->Db=authz->Table=words->Column=*->action=
> sele
> >> ct
> >> > > > >>>>>
> >> > > > >>>>> Essentially, authorization is denied even though the policy
> is
> >> > > > correct.
> >> > > > >>>> If
> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached privilege is:
> >> > > > >>>>>
> >> > > > >>>>> server=server1->db=authz->table=words->column=*=[Server=
> >> > > > >>>>> server1->Db=authz->Table=words->Column=*->action=select]
> >> > > > >>>>>
> >> > > > >>>>> However, when "listPrivileges" is called, the authorizable
> >> > > hierarchy
> >> > > > >>>> looks
> >> > > > >>>>> like:
> >> > > > >>>>>
> >> > > > >>>>> Server [name=server1]
> >> > > > >>>>> Database [name=authz]
> >> > > > >>>>> Table [name=words]
> >> > > > >>>>>
> >> > > > >>>>> There is no "column" here, and a match is not made against
> the
> >> > > cached
> >> > > > >>>>> privilege as a result. Is this a bug or am I missing some
> >> > > > configuration
> >> > > > >>>>> switch?
> >> > > > >>>>>
> >> > > > >>>>> Colm.
> >> > > > >>>>>
> >> > > > >>>>>
> >> > > > >>>>> --
> >> > > > >>>>> Colm O hEigeartaigh
> >> > > > >>>>>
> >> > > > >>>>> Talend Community Coder
> >> > > > >>>>> http://coders.talend.com
> >> > > > >>>>>
> >> > > > >>>>
> >> > > > >>>
> >> > > > >>>
> >> > > > >>>
> >> > > > >>> --
> >> > > > >>> Colm O hEigeartaigh
> >> > > > >>>
> >> > > > >>> Talend Community Coder
> >> > > > >>> http://coders.talend.com
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Colm O hEigeartaigh
> >> > > > >
> >> > > > > Talend Community Coder
> >> > > > > http://coders.talend.com
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Colm O hEigeartaigh
> >>
> >> Talend Community Coder
> >> http://coders.talend.com
> >>
> >
> >
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>

Re: Issue with SimpleCacheProviderBackend

Posted by Colm O hEigeartaigh <co...@apache.org>.
Hi Lina,

No, I was using V1 - the problem was that I wasn't explicitly setting
"ConfVars.HIVE_STATS_COLLECT_SCANCOLS" to "true". What I meant was that if
I was using the V2 binding (and previously my testcase used the Sentry V2
binding with 1.8.0), then it wasn't necessary to set this configuration
variable.

Colm.

On Fri, Jan 5, 2018 at 5:34 PM, Na Li <li...@cloudera.com> wrote:

> Colm,
>
> I have created SENTRY-2118 to document this setting.
>
> It is strange that without this setting, you have V2 working. From the
> following code, the column info is not set in ReadEntity if
> HIVE_STATS_COLLECT_SCANCOLS is false.
>
> if (HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>           this.putAccessedColumnsToReadEntity(this.inputs, this.columnAccessInfo);
>         }
>
>
> Thanks,
>
> Lina
>
> On Fri, Jan 5, 2018 at 10:23 AM, Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
>> Hi Lina,
>>
>>
>>> Glad I can help. Do you know what configuration caused the columns not
>>> parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2()
>>> == false?
>>>
>>
>> Yes exactly - I'm using the V1 binding.
>>
>> Colm.
>>
>>
>>>
>>> Thanks,
>>>
>>> Lina
>>>
>>> On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <coheigea@apache.org
>>> > wrote:
>>>
>>>> Hi Lina,
>>>>
>>>> Thanks a lot for your help on this! I was able to get the test to work
>>>> by
>>>> adding the following config option:
>>>>
>>>> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname,
>>>> "true");
>>>>
>>>> Colm.
>>>>
>>>> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <li...@cloudera.com> wrote:
>>>>
>>>> > Colm,
>>>> >
>>>> > The following code shows where Hive sets the column info. You can
>>>> debug
>>>> > into hive code and see why AccessedColumns is not set.
>>>> >
>>>> > The related code is in org.apache.hadoop.hive.ql.pars
>>>> e.SemanticAnalyzer
>>>> >
>>>> >               boolean isColumnInfoNeedForAuth =
>>>> SessionState.get().isAuthorizationModeV2() &&
>>>> HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
>>>> >         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
>>>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>>>> >           ColumnAccessAnalyzer columnAccessAnalyzer = new
>>>> ColumnAccessAnalyzer(pCtx);
>>>> >           this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumn
>>>> Access(this.getColumnAccessInfo()));
>>>> >         }
>>>> >
>>>> >           this.LOG.info("Completed plan generation");
>>>> >         if (HiveConf.getBoolVar(this.conf,
>>>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>>>> >           this.putAccessedColumnsToReadEntity(this.inputs,
>>>> this.columnAccessInfo);
>>>> >         }
>>>> >
>>>> >
>>>> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <li...@cloudera.com> wrote:
>>>> >
>>>> >> Colm,
>>>> >>
>>>> >> I tried to reproduce your issue using sentry 2.0 (master branch) with
>>>> >> Hive 2.3.2.
>>>> >>
>>>> >> The test code is
>>>> >>
>>>> >>   @Test
>>>> >>   public void testPositiveOnAll() throws Exception {
>>>> >>     Connection connection = context.createConnection(ADMIN1);
>>>> >>     Statement statement = context.createStatement(connection);
>>>> >>     statement.execute("CREATE database " + DB1);
>>>> >>     statement.execute("use " + DB1);
>>>> >>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>>>> >>     statement.execute("CREATE ROLE user_role1");
>>>> >>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE
>>>> user_role1*");
>>>> >>     statement.execute("GRANT ROLE user_role1 TO GROUP " +
>>>> USERGROUP1);
>>>> >>     statement.close();
>>>> >>     connection.close();
>>>> >>
>>>> >>     connection = context.createConnection(USER1_1);
>>>> >>     statement = context.createStatement(connection);
>>>> >>     statement.execute("use " + DB1);
>>>> >>     statement.execute("*SELECT * FROM t1*");
>>>> >>
>>>> >>     statement.close();
>>>> >>     connection.close();
>>>> >>   }
>>>> >>
>>>> >>
>>>> >> required privileges:
>>>> >>
>>>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>>>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>>>> >>
>>>> >>
>>>> >> cached privilege:
>>>> >>
>>>> >>    - server=server1->db=db_1->table=t1->action=select
>>>> >>
>>>> >> So the authorization works.
>>>> >>
>>>> >> Note
>>>> >>
>>>> >>    - For me, the "*SELECT * FROM t1*" causes the required privileges
>>>> to
>>>> >>    contain each column explicitly. However, for you, The "privilege"
>>>> to check
>>>> >>    looks like:
>>>> >>    Server=server1->Db=authz->Table=words->action=select; The
>>>> columns are
>>>> >>    not explicitly listed. Hive controls if the column is included in
>>>> >>    required privilege. At org.apache.sentry.binding.h
>>>> >>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>>>> >>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>>>> >>    accessedColumns from Hive input to add colHierarchy for each
>>>> column.
>>>> >>    You can check if accessedColumns is empty or null for the hive
>>>> >>    version you are using.
>>>> >>    - For me, the cached privilege does not include column part. For
>>>> you,
>>>> >>    the cached privilege is "Server=server1->Db=authz->Table=words->
>>>> >>    *Column=**->action=select". *Can you share your test code*, so I
>>>> can
>>>> >>    see how you grant the privilege and therefore the cached
>>>> privilege contains
>>>> >>    column?
>>>> >>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>>>> >>       user_role1", and got following error
>>>> >>       -
>>>> >>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool:
>>>> Thread-212)
>>>> >>       [WARN - org.apache.hive.service.cli.th
>>>> >>       rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.jav
>>>> a:539)]
>>>> >>       Error executing statement:
>>>> >>       - org.apache.hive.service.cli.HiveSQLException: Error while
>>>> >>       compiling statement: FAILED: ParseException line 1:6 cannot
>>>> recognize input
>>>> >>       near 'GRANT' 'SELECT' '(' in ddl statement
>>>> >>       - at org.apache.hive.service.cli.op
>>>> eration.Operation.toSQLExcepti
>>>> >>       on(Operation.java:380)
>>>> >>       - at org.apache.hive.service.cli.op
>>>> eration.SQLOperation.prepare(
>>>> >>       SQLOperation.java:206)
>>>> >>       - at org.apache.hive.service.cli.op
>>>> eration.SQLOperation.runIntern
>>>> >>       al(SQLOperation.java:290)
>>>> >>       - at org.apache.hive.service.cli.op
>>>> eration.Operation.run(Operatio
>>>> >>       n.java:320)
>>>> >>       - at org.apache.hive.service.cli.se
>>>> ssion.HiveSessionImpl.executeS
>>>> >>       tatementInternal(HiveSessionImpl.java:530)
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> Lina
>>>> >>
>>>> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
>>>> >> coheigea@apache.org> wrote:
>>>> >>
>>>> >>> Thanks Kalyan! I was thinking that if the cached privilege part
>>>> does not
>>>> >>> appear in the requested "part", and if is "all", then we should
>>>> skip that
>>>> >>> part and continue on to the next one. But maybe there is a better
>>>> >>> solution.
>>>> >>>
>>>> >>> Colm.
>>>> >>>
>>>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>>>> >>> kkalyan@cloudera.com> wrote:
>>>> >>>
>>>> >>> > Colm,
>>>> >>> >
>>>> >>> > I will look closer into this today and see If i can help you out.
>>>> >>> >
>>>> >>> > -Kalyan
>>>> >>> >
>>>> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>>>> >>> coheigea@apache.org>
>>>> >>> > wrote:
>>>> >>> >
>>>> >>> >> Hi,
>>>> >>> >>
>>>> >>> >> I've done some further analysis of the problem, and I think it
>>>> is not
>>>> >>> >> directly related to SENTRY-1291. The problem manifests in
>>>> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
>>>> >>> looks
>>>> >>> >> like:
>>>> >>> >>
>>>> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>>>> >>> >>
>>>> >>> >> The "privilege" I want to check looks like:
>>>> >>> >>
>>>> >>> >> Server=server1->Db=authz->Table=words->action=select;
>>>> >>> >>
>>>> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It
>>>> loops
>>>> >>> on
>>>> >>> >> the parts of the second privilege, and matches up to
>>>> "action=select".
>>>> >>> Here
>>>> >>> >> it tries to compare to "Column=*" of the cached privilege and
>>>> fails on
>>>> >>> >> this
>>>> >>> >> line:
>>>> >>> >>
>>>> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>>>> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>>>> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
>>>> >>> e.java#L86
>>>> >>> >>
>>>> >>> >> It's clear there's a bug here somewhere, but I'm not sure where
>>>> - can
>>>> >>> >> someone please advise?
>>>> >>> >>
>>>> >>> >> Thanks,
>>>> >>> >>
>>>> >>> >> Colm.
>>>> >>> >>
>>>> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com>
>>>> wrote:
>>>> >>> >>
>>>> >>> >> > Sasha,
>>>> >>> >> >
>>>> >>> >> > sentry-1291 is helpful for the problem that sentry privilege
>>>> checks
>>>> >>> >> takes
>>>> >>> >> > too long with many explicit grants, which is useful for big
>>>> >>> customers.
>>>> >>> >> > Another approach that can improve the performance is to
>>>> organize the
>>>> >>> >> > privileges according to the authorization hierarchy in a tree
>>>> >>> >> structure, so
>>>> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess()
>>>> is in
>>>> >>> the
>>>> >>> >> > order of log(N), not linear of N, where N is the number of
>>>> >>> privileges.
>>>> >>> >> >
>>>> >>> >> > We can wait for Colm to confirm his issue is caused by
>>>> sentry-1291.
>>>> >>> If
>>>> >>> >> so,
>>>> >>> >> > it may be fixed by selecting privileges by finding if the
>>>> requesting
>>>> >>> >> > authorization object is prefix of cached privileges instead of
>>>> exact
>>>> >>> >> match.
>>>> >>> >> >
>>>> >>> >> > in SimplePrivilegeCache
>>>> >>> >> >
>>>> >>> >> > public Set<String> listPrivileges(Set<String> groups,
>>>> Set<String>
>>>> >>> users,
>>>> >>> >> > ActiveRoleSet roleSet,
>>>> >>> >> >       Authorizable... authorizationHierarchy) {
>>>> >>> >> >     Set<String> privileges = new HashSet<>();
>>>> >>> >> >     Set<StringBuilder> authzKeys =
>>>> getAuthzKeys(authorizationHier
>>>> >>> >> archy);
>>>> >>> >> >     for (StringBuilder authzKey : authzKeys) {
>>>> >>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) !=
>>>> null) {
>>>> >>> >> >   <-
>>>> >>> >> > instead of exact matching, add extension function to check if
>>>> >>> >> > authzKey.toString is the prefix of the key of the entries
>>>> >>> >> > in cachedAuthzPrivileges.
>>>> >>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>>>> >>> >> toString()));
>>>> >>> >> >       }
>>>> >>> >> >     }
>>>> >>> >> >
>>>> >>> >> >     return privileges;
>>>> >>> >> >   }
>>>> >>> >> >
>>>> >>> >> > Thanks,
>>>> >>> >> >
>>>> >>> >> > Lina
>>>> >>> >> >
>>>> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>>>> >>> akolb@cloudera.com
>>>> >>> >> >
>>>> >>> >> > wrote:
>>>> >>> >> >
>>>> >>> >> > > I think that SENTRY-1291 should be just reverted - there are
>>>> >>> multiple
>>>> >>> >> > > issues with it and no one is actually using the fix. Anyone
>>>> wants
>>>> >>> to
>>>> >>> >> do
>>>> >>> >> > it?
>>>> >>> >> > >
>>>> >>> >> > > - Alex
>>>> >>> >> > >
>>>> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <lina.li@cloudera.com
>>>> >
>>>> >>> wrote:
>>>> >>> >> > >
>>>> >>> >> > > > Colm,
>>>> >>> >> > > >
>>>> >>> >> > > > Glad you find the cause!
>>>> >>> >> > > >
>>>> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it
>>>> is
>>>> >>> issue
>>>> >>> >> at
>>>> >>> >> > > > finding cached privileges.
>>>> >>> >> > > >
>>>> >>> >> > > > Cheers,
>>>> >>> >> > > >
>>>> >>> >> > > > Lina
>>>> >>> >> > > >
>>>> >>> >> > > > Sent from my iPhone
>>>> >>> >> > > >
>>>> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>>>> >>> >> > coheigea@apache.org>
>>>> >>> >> > > > wrote:
>>>> >>> >> > > > >
>>>> >>> >> > > > > Hi,
>>>> >>> >> > > > >
>>>> >>> >> > > > > I can see what the problem is (that the authorization
>>>> >>> hierarchy
>>>> >>> >> does
>>>> >>> >> > > not
>>>> >>> >> > > > > contain the column, and hence doesn't match against the
>>>> cached
>>>> >>> >> > > > privilege),
>>>> >>> >> > > > > but I'm not sure about the best way to solve it. Either
>>>> the
>>>> >>> way we
>>>> >>> >> > are
>>>> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g.
>>>> in
>>>> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing
>>>> the
>>>> >>> >> cached
>>>> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>>>> >>> >> > CommonPrivilege).
>>>> >>> >> > > > >
>>>> >>> >> > > > > Colm.
>>>> >>> >> > > > >
>>>> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <
>>>> lina.li@cloudera.com
>>>> >>> >
>>>> >>> >> > wrote:
>>>> >>> >> > > > >>
>>>> >>> >> > > > >> Colm,
>>>> >>> >> > > > >>
>>>> >>> >> > > > >> I did not get chance to look into this issue today.
>>>> Sorry
>>>> >>> about
>>>> >>> >> > that.
>>>> >>> >> > > > >>
>>>> >>> >> > > > >> You can add a e2e test case and set break point at
>>>> where the
>>>> >>> >> > > > authorization
>>>> >>> >> > > > >> object hierarchy to a list of authorization objects,
>>>> which is
>>>> >>> >> used
>>>> >>> >> > to
>>>> >>> >> > > do
>>>> >>> >> > > > >> exact match with cache
>>>> >>> >> > > > >>
>>>> >>> >> > > > >> Sent from my iPhone
>>>> >>> >> > > > >>
>>>> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>>>> >>> >> > > coheigea@apache.org
>>>> >>> >> > > > >
>>>> >>> >> > > > >> wrote:
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>> That would be great, thanks!
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>> Colm.
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>>>> >>> lina.li@cloudera.com>
>>>> >>> >> > > wrote:
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>> Colm,
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a
>>>> look
>>>> >>> later
>>>> >>> >> > today.
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>> Thanks,
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>> Lina
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>>>> >>> >> > > > >> coheigea@apache.org>
>>>> >>> >> > > > >>>> wrote:
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>>> Hi all,
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry
>>>> >>> 2.0.0
>>>> >>> >> and
>>>> >>> >> > the
>>>> >>> >> > > > >> "v1"
>>>> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0
>>>> and the
>>>> >>> "v2"
>>>> >>> >> > > > >> binding).
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING,
>>>> count
>>>> >>> >> INT). I
>>>> >>> >> > am
>>>> >>> >> > > > >>>> making
>>>> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM
>>>> words
>>>> >>> where
>>>> >>> >> > > count
>>>> >>> >> > > > ==
>>>> >>> >> > > > >>>>> '100'".
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the
>>>> following
>>>> >>> >> role:
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> select_all_role =
>>>> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>>>> >>> e=words->Column=*->action=sele
>>>> >>> >> ct
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> Essentially, authorization is denied even though the
>>>> >>> policy is
>>>> >>> >> > > > correct.
>>>> >>> >> > > > >>>> If
>>>> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached
>>>> privilege
>>>> >>> is:
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> server=server1->db=authz->tabl
>>>> e=words->column=*=[Server=
>>>> >>> >> > > > >>>>> server1->Db=authz->Table=words
>>>> ->Column=*->action=select]
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the
>>>> authorizable
>>>> >>> >> > > hierarchy
>>>> >>> >> > > > >>>> looks
>>>> >>> >> > > > >>>>> like:
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> Server [name=server1]
>>>> >>> >> > > > >>>>> Database [name=authz]
>>>> >>> >> > > > >>>>> Table [name=words]
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> There is no "column" here, and a match is not made
>>>> >>> against the
>>>> >>> >> > > cached
>>>> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing
>>>> some
>>>> >>> >> > > > configuration
>>>> >>> >> > > > >>>>> switch?
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> Colm.
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> --
>>>> >>> >> > > > >>>>> Colm O hEigeartaigh
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>> Talend Community Coder
>>>> >>> >> > > > >>>>> http://coders.talend.com
>>>> >>> >> > > > >>>>>
>>>> >>> >> > > > >>>>
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>> --
>>>> >>> >> > > > >>> Colm O hEigeartaigh
>>>> >>> >> > > > >>>
>>>> >>> >> > > > >>> Talend Community Coder
>>>> >>> >> > > > >>> http://coders.talend.com
>>>> >>> >> > > > >>
>>>> >>> >> > > > >
>>>> >>> >> > > > >
>>>> >>> >> > > > >
>>>> >>> >> > > > > --
>>>> >>> >> > > > > Colm O hEigeartaigh
>>>> >>> >> > > > >
>>>> >>> >> > > > > Talend Community Coder
>>>> >>> >> > > > > http://coders.talend.com
>>>> >>> >> > > >
>>>> >>> >> > >
>>>> >>> >> >
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> Colm O hEigeartaigh
>>>> >>> >>
>>>> >>> >> Talend Community Coder
>>>> >>> >> http://coders.talend.com
>>>> >>> >>
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Colm O hEigeartaigh
>>>> >>>
>>>> >>> Talend Community Coder
>>>> >>> http://coders.talend.com
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>>
>>>>
>>>> --
>>>> Colm O hEigeartaigh
>>>>
>>>> Talend Community Coder
>>>> http://coders.talend.com
>>>>
>>>
>>>
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Issue with SimpleCacheProviderBackend

Posted by Na Li <li...@cloudera.com>.
Colm,

I have created SENTRY-2118 to document this setting.

It is strange that without this setting, you have V2 working. From the
following code, the column info is not set in ReadEntity if
HIVE_STATS_COLLECT_SCANCOLS is false.

if (HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
          this.putAccessedColumnsToReadEntity(this.inputs,
this.columnAccessInfo);
        }


Thanks,

Lina

On Fri, Jan 5, 2018 at 10:23 AM, Colm O hEigeartaigh <co...@apache.org>
wrote:

> Hi Lina,
>
>
>> Glad I can help. Do you know what configuration caused the columns not
>> parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2()
>> == false?
>>
>
> Yes exactly - I'm using the V1 binding.
>
> Colm.
>
>
>>
>> Thanks,
>>
>> Lina
>>
>> On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <co...@apache.org>
>> wrote:
>>
>>> Hi Lina,
>>>
>>> Thanks a lot for your help on this! I was able to get the test to work by
>>> adding the following config option:
>>>
>>> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true");
>>>
>>> Colm.
>>>
>>> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <li...@cloudera.com> wrote:
>>>
>>> > Colm,
>>> >
>>> > The following code shows where Hive sets the column info. You can debug
>>> > into hive code and see why AccessedColumns is not set.
>>> >
>>> > The related code is in org.apache.hadoop.hive.ql.pars
>>> e.SemanticAnalyzer
>>> >
>>> >               boolean isColumnInfoNeedForAuth =
>>> SessionState.get().isAuthorizationModeV2() &&
>>> HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
>>> >         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
>>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>>> >           ColumnAccessAnalyzer columnAccessAnalyzer = new
>>> ColumnAccessAnalyzer(pCtx);
>>> >           this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumn
>>> Access(this.getColumnAccessInfo()));
>>> >         }
>>> >
>>> >           this.LOG.info("Completed plan generation");
>>> >         if (HiveConf.getBoolVar(this.conf,
>>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>>> >           this.putAccessedColumnsToReadEntity(this.inputs,
>>> this.columnAccessInfo);
>>> >         }
>>> >
>>> >
>>> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <li...@cloudera.com> wrote:
>>> >
>>> >> Colm,
>>> >>
>>> >> I tried to reproduce your issue using sentry 2.0 (master branch) with
>>> >> Hive 2.3.2.
>>> >>
>>> >> The test code is
>>> >>
>>> >>   @Test
>>> >>   public void testPositiveOnAll() throws Exception {
>>> >>     Connection connection = context.createConnection(ADMIN1);
>>> >>     Statement statement = context.createStatement(connection);
>>> >>     statement.execute("CREATE database " + DB1);
>>> >>     statement.execute("use " + DB1);
>>> >>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>>> >>     statement.execute("CREATE ROLE user_role1");
>>> >>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE
>>> user_role1*");
>>> >>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
>>> >>     statement.close();
>>> >>     connection.close();
>>> >>
>>> >>     connection = context.createConnection(USER1_1);
>>> >>     statement = context.createStatement(connection);
>>> >>     statement.execute("use " + DB1);
>>> >>     statement.execute("*SELECT * FROM t1*");
>>> >>
>>> >>     statement.close();
>>> >>     connection.close();
>>> >>   }
>>> >>
>>> >>
>>> >> required privileges:
>>> >>
>>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>>> >>
>>> >>
>>> >> cached privilege:
>>> >>
>>> >>    - server=server1->db=db_1->table=t1->action=select
>>> >>
>>> >> So the authorization works.
>>> >>
>>> >> Note
>>> >>
>>> >>    - For me, the "*SELECT * FROM t1*" causes the required privileges
>>> to
>>> >>    contain each column explicitly. However, for you, The "privilege"
>>> to check
>>> >>    looks like:
>>> >>    Server=server1->Db=authz->Table=words->action=select; The columns
>>> are
>>> >>    not explicitly listed. Hive controls if the column is included in
>>> >>    required privilege. At org.apache.sentry.binding.h
>>> >>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>>> >>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>>> >>    accessedColumns from Hive input to add colHierarchy for each
>>> column.
>>> >>    You can check if accessedColumns is empty or null for the hive
>>> >>    version you are using.
>>> >>    - For me, the cached privilege does not include column part. For
>>> you,
>>> >>    the cached privilege is "Server=server1->Db=authz->Table=words->
>>> >>    *Column=**->action=select". *Can you share your test code*, so I
>>> can
>>> >>    see how you grant the privilege and therefore the cached privilege
>>> contains
>>> >>    column?
>>> >>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>>> >>       user_role1", and got following error
>>> >>       -
>>> >>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
>>> >>       [WARN - org.apache.hive.service.cli.th
>>> >>       rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.jav
>>> a:539)]
>>> >>       Error executing statement:
>>> >>       - org.apache.hive.service.cli.HiveSQLException: Error while
>>> >>       compiling statement: FAILED: ParseException line 1:6 cannot
>>> recognize input
>>> >>       near 'GRANT' 'SELECT' '(' in ddl statement
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.Operation.toSQLExcepti
>>> >>       on(Operation.java:380)
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.SQLOperation.prepare(
>>> >>       SQLOperation.java:206)
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.SQLOperation.runIntern
>>> >>       al(SQLOperation.java:290)
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.Operation.run(Operatio
>>> >>       n.java:320)
>>> >>       - at org.apache.hive.service.cli.se
>>> ssion.HiveSessionImpl.executeS
>>> >>       tatementInternal(HiveSessionImpl.java:530)
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Lina
>>> >>
>>> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
>>> >> coheigea@apache.org> wrote:
>>> >>
>>> >>> Thanks Kalyan! I was thinking that if the cached privilege part does
>>> not
>>> >>> appear in the requested "part", and if is "all", then we should skip
>>> that
>>> >>> part and continue on to the next one. But maybe there is a better
>>> >>> solution.
>>> >>>
>>> >>> Colm.
>>> >>>
>>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>>> >>> kkalyan@cloudera.com> wrote:
>>> >>>
>>> >>> > Colm,
>>> >>> >
>>> >>> > I will look closer into this today and see If i can help you out.
>>> >>> >
>>> >>> > -Kalyan
>>> >>> >
>>> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>>> >>> coheigea@apache.org>
>>> >>> > wrote:
>>> >>> >
>>> >>> >> Hi,
>>> >>> >>
>>> >>> >> I've done some further analysis of the problem, and I think it is
>>> not
>>> >>> >> directly related to SENTRY-1291. The problem manifests in
>>> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
>>> >>> looks
>>> >>> >> like:
>>> >>> >>
>>> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>>> >>> >>
>>> >>> >> The "privilege" I want to check looks like:
>>> >>> >>
>>> >>> >> Server=server1->Db=authz->Table=words->action=select;
>>> >>> >>
>>> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It
>>> loops
>>> >>> on
>>> >>> >> the parts of the second privilege, and matches up to
>>> "action=select".
>>> >>> Here
>>> >>> >> it tries to compare to "Column=*" of the cached privilege and
>>> fails on
>>> >>> >> this
>>> >>> >> line:
>>> >>> >>
>>> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>>> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>>> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
>>> >>> e.java#L86
>>> >>> >>
>>> >>> >> It's clear there's a bug here somewhere, but I'm not sure where -
>>> can
>>> >>> >> someone please advise?
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >>
>>> >>> >> Colm.
>>> >>> >>
>>> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com>
>>> wrote:
>>> >>> >>
>>> >>> >> > Sasha,
>>> >>> >> >
>>> >>> >> > sentry-1291 is helpful for the problem that sentry privilege
>>> checks
>>> >>> >> takes
>>> >>> >> > too long with many explicit grants, which is useful for big
>>> >>> customers.
>>> >>> >> > Another approach that can improve the performance is to
>>> organize the
>>> >>> >> > privileges according to the authorization hierarchy in a tree
>>> >>> >> structure, so
>>> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess()
>>> is in
>>> >>> the
>>> >>> >> > order of log(N), not linear of N, where N is the number of
>>> >>> privileges.
>>> >>> >> >
>>> >>> >> > We can wait for Colm to confirm his issue is caused by
>>> sentry-1291.
>>> >>> If
>>> >>> >> so,
>>> >>> >> > it may be fixed by selecting privileges by finding if the
>>> requesting
>>> >>> >> > authorization object is prefix of cached privileges instead of
>>> exact
>>> >>> >> match.
>>> >>> >> >
>>> >>> >> > in SimplePrivilegeCache
>>> >>> >> >
>>> >>> >> > public Set<String> listPrivileges(Set<String> groups,
>>> Set<String>
>>> >>> users,
>>> >>> >> > ActiveRoleSet roleSet,
>>> >>> >> >       Authorizable... authorizationHierarchy) {
>>> >>> >> >     Set<String> privileges = new HashSet<>();
>>> >>> >> >     Set<StringBuilder> authzKeys =
>>> getAuthzKeys(authorizationHier
>>> >>> >> archy);
>>> >>> >> >     for (StringBuilder authzKey : authzKeys) {
>>> >>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) !=
>>> null) {
>>> >>> >> >   <-
>>> >>> >> > instead of exact matching, add extension function to check if
>>> >>> >> > authzKey.toString is the prefix of the key of the entries
>>> >>> >> > in cachedAuthzPrivileges.
>>> >>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>>> >>> >> toString()));
>>> >>> >> >       }
>>> >>> >> >     }
>>> >>> >> >
>>> >>> >> >     return privileges;
>>> >>> >> >   }
>>> >>> >> >
>>> >>> >> > Thanks,
>>> >>> >> >
>>> >>> >> > Lina
>>> >>> >> >
>>> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>>> >>> akolb@cloudera.com
>>> >>> >> >
>>> >>> >> > wrote:
>>> >>> >> >
>>> >>> >> > > I think that SENTRY-1291 should be just reverted - there are
>>> >>> multiple
>>> >>> >> > > issues with it and no one is actually using the fix. Anyone
>>> wants
>>> >>> to
>>> >>> >> do
>>> >>> >> > it?
>>> >>> >> > >
>>> >>> >> > > - Alex
>>> >>> >> > >
>>> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <li...@cloudera.com>
>>> >>> wrote:
>>> >>> >> > >
>>> >>> >> > > > Colm,
>>> >>> >> > > >
>>> >>> >> > > > Glad you find the cause!
>>> >>> >> > > >
>>> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it
>>> is
>>> >>> issue
>>> >>> >> at
>>> >>> >> > > > finding cached privileges.
>>> >>> >> > > >
>>> >>> >> > > > Cheers,
>>> >>> >> > > >
>>> >>> >> > > > Lina
>>> >>> >> > > >
>>> >>> >> > > > Sent from my iPhone
>>> >>> >> > > >
>>> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>>> >>> >> > coheigea@apache.org>
>>> >>> >> > > > wrote:
>>> >>> >> > > > >
>>> >>> >> > > > > Hi,
>>> >>> >> > > > >
>>> >>> >> > > > > I can see what the problem is (that the authorization
>>> >>> hierarchy
>>> >>> >> does
>>> >>> >> > > not
>>> >>> >> > > > > contain the column, and hence doesn't match against the
>>> cached
>>> >>> >> > > > privilege),
>>> >>> >> > > > > but I'm not sure about the best way to solve it. Either
>>> the
>>> >>> way we
>>> >>> >> > are
>>> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
>>> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing
>>> the
>>> >>> >> cached
>>> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>>> >>> >> > CommonPrivilege).
>>> >>> >> > > > >
>>> >>> >> > > > > Colm.
>>> >>> >> > > > >
>>> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <
>>> lina.li@cloudera.com
>>> >>> >
>>> >>> >> > wrote:
>>> >>> >> > > > >>
>>> >>> >> > > > >> Colm,
>>> >>> >> > > > >>
>>> >>> >> > > > >> I did not get chance to look into this issue today. Sorry
>>> >>> about
>>> >>> >> > that.
>>> >>> >> > > > >>
>>> >>> >> > > > >> You can add a e2e test case and set break point at where
>>> the
>>> >>> >> > > > authorization
>>> >>> >> > > > >> object hierarchy to a list of authorization objects,
>>> which is
>>> >>> >> used
>>> >>> >> > to
>>> >>> >> > > do
>>> >>> >> > > > >> exact match with cache
>>> >>> >> > > > >>
>>> >>> >> > > > >> Sent from my iPhone
>>> >>> >> > > > >>
>>> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>>> >>> >> > > coheigea@apache.org
>>> >>> >> > > > >
>>> >>> >> > > > >> wrote:
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> That would be great, thanks!
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> Colm.
>>> >>> >> > > > >>>
>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>>> >>> lina.li@cloudera.com>
>>> >>> >> > > wrote:
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> Colm,
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
>>> >>> later
>>> >>> >> > today.
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> Thanks,
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> Lina
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>>> >>> >> > > > >> coheigea@apache.org>
>>> >>> >> > > > >>>> wrote:
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>>> Hi all,
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry
>>> >>> 2.0.0
>>> >>> >> and
>>> >>> >> > the
>>> >>> >> > > > >> "v1"
>>> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and
>>> the
>>> >>> "v2"
>>> >>> >> > > > >> binding).
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING,
>>> count
>>> >>> >> INT). I
>>> >>> >> > am
>>> >>> >> > > > >>>> making
>>> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM
>>> words
>>> >>> where
>>> >>> >> > > count
>>> >>> >> > > > ==
>>> >>> >> > > > >>>>> '100'".
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the
>>> following
>>> >>> >> role:
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> select_all_role =
>>> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>>> >>> e=words->Column=*->action=sele
>>> >>> >> ct
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Essentially, authorization is denied even though the
>>> >>> policy is
>>> >>> >> > > > correct.
>>> >>> >> > > > >>>> If
>>> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached
>>> privilege
>>> >>> is:
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> server=server1->db=authz->tabl
>>> e=words->column=*=[Server=
>>> >>> >> > > > >>>>> server1->Db=authz->Table=words
>>> ->Column=*->action=select]
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the
>>> authorizable
>>> >>> >> > > hierarchy
>>> >>> >> > > > >>>> looks
>>> >>> >> > > > >>>>> like:
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Server [name=server1]
>>> >>> >> > > > >>>>> Database [name=authz]
>>> >>> >> > > > >>>>> Table [name=words]
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> There is no "column" here, and a match is not made
>>> >>> against the
>>> >>> >> > > cached
>>> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing
>>> some
>>> >>> >> > > > configuration
>>> >>> >> > > > >>>>> switch?
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Colm.
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> --
>>> >>> >> > > > >>>>> Colm O hEigeartaigh
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Talend Community Coder
>>> >>> >> > > > >>>>> http://coders.talend.com
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>
>>> >>> >> > > > >>>
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> --
>>> >>> >> > > > >>> Colm O hEigeartaigh
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> Talend Community Coder
>>> >>> >> > > > >>> http://coders.talend.com
>>> >>> >> > > > >>
>>> >>> >> > > > >
>>> >>> >> > > > >
>>> >>> >> > > > >
>>> >>> >> > > > > --
>>> >>> >> > > > > Colm O hEigeartaigh
>>> >>> >> > > > >
>>> >>> >> > > > > Talend Community Coder
>>> >>> >> > > > > http://coders.talend.com
>>> >>> >> > > >
>>> >>> >> > >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Colm O hEigeartaigh
>>> >>> >>
>>> >>> >> Talend Community Coder
>>> >>> >> http://coders.talend.com
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Colm O hEigeartaigh
>>> >>>
>>> >>> Talend Community Coder
>>> >>> http://coders.talend.com
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>

Re: Issue with SimpleCacheProviderBackend

Posted by Colm O hEigeartaigh <co...@apache.org>.
Hi Lina,


> Glad I can help. Do you know what configuration caused the columns not
> parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2()
> == false?
>

Yes exactly - I'm using the V1 binding.

Colm.


>
> Thanks,
>
> Lina
>
> On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
>> Hi Lina,
>>
>> Thanks a lot for your help on this! I was able to get the test to work by
>> adding the following config option:
>>
>> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true");
>>
>> Colm.
>>
>> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <li...@cloudera.com> wrote:
>>
>> > Colm,
>> >
>> > The following code shows where Hive sets the column info. You can debug
>> > into hive code and see why AccessedColumns is not set.
>> >
>> > The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
>> >
>> >               boolean isColumnInfoNeedForAuth =
>> SessionState.get().isAuthorizationModeV2() &&
>> HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
>> >         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>> >           ColumnAccessAnalyzer columnAccessAnalyzer = new
>> ColumnAccessAnalyzer(pCtx);
>> >           this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumn
>> Access(this.getColumnAccessInfo()));
>> >         }
>> >
>> >           this.LOG.info("Completed plan generation");
>> >         if (HiveConf.getBoolVar(this.conf,
>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>> >           this.putAccessedColumnsToReadEntity(this.inputs,
>> this.columnAccessInfo);
>> >         }
>> >
>> >
>> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <li...@cloudera.com> wrote:
>> >
>> >> Colm,
>> >>
>> >> I tried to reproduce your issue using sentry 2.0 (master branch) with
>> >> Hive 2.3.2.
>> >>
>> >> The test code is
>> >>
>> >>   @Test
>> >>   public void testPositiveOnAll() throws Exception {
>> >>     Connection connection = context.createConnection(ADMIN1);
>> >>     Statement statement = context.createStatement(connection);
>> >>     statement.execute("CREATE database " + DB1);
>> >>     statement.execute("use " + DB1);
>> >>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>> >>     statement.execute("CREATE ROLE user_role1");
>> >>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*");
>> >>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
>> >>     statement.close();
>> >>     connection.close();
>> >>
>> >>     connection = context.createConnection(USER1_1);
>> >>     statement = context.createStatement(connection);
>> >>     statement.execute("use " + DB1);
>> >>     statement.execute("*SELECT * FROM t1*");
>> >>
>> >>     statement.close();
>> >>     connection.close();
>> >>   }
>> >>
>> >>
>> >> required privileges:
>> >>
>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>> >>
>> >>
>> >> cached privilege:
>> >>
>> >>    - server=server1->db=db_1->table=t1->action=select
>> >>
>> >> So the authorization works.
>> >>
>> >> Note
>> >>
>> >>    - For me, the "*SELECT * FROM t1*" causes the required privileges to
>> >>    contain each column explicitly. However, for you, The "privilege"
>> to check
>> >>    looks like:
>> >>    Server=server1->Db=authz->Table=words->action=select; The columns
>> are
>> >>    not explicitly listed. Hive controls if the column is included in
>> >>    required privilege. At org.apache.sentry.binding.h
>> >>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>> >>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>> >>    accessedColumns from Hive input to add colHierarchy for each column.
>> >>    You can check if accessedColumns is empty or null for the hive
>> >>    version you are using.
>> >>    - For me, the cached privilege does not include column part. For
>> you,
>> >>    the cached privilege is "Server=server1->Db=authz->Table=words->
>> >>    *Column=**->action=select". *Can you share your test code*, so I can
>> >>    see how you grant the privilege and therefore the cached privilege
>> contains
>> >>    column?
>> >>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>> >>       user_role1", and got following error
>> >>       -
>> >>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
>> >>       [WARN - org.apache.hive.service.cli.th
>> >>       rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.
>> java:539)]
>> >>       Error executing statement:
>> >>       - org.apache.hive.service.cli.HiveSQLException: Error while
>> >>       compiling statement: FAILED: ParseException line 1:6 cannot
>> recognize input
>> >>       near 'GRANT' 'SELECT' '(' in ddl statement
>> >>       - at org.apache.hive.service.cli.op
>> eration.Operation.toSQLExcepti
>> >>       on(Operation.java:380)
>> >>       - at org.apache.hive.service.cli.operation.SQLOperation.prepare(
>> >>       SQLOperation.java:206)
>> >>       - at org.apache.hive.service.cli.op
>> eration.SQLOperation.runIntern
>> >>       al(SQLOperation.java:290)
>> >>       - at org.apache.hive.service.cli.op
>> eration.Operation.run(Operatio
>> >>       n.java:320)
>> >>       - at org.apache.hive.service.cli.se
>> ssion.HiveSessionImpl.executeS
>> >>       tatementInternal(HiveSessionImpl.java:530)
>> >>
>> >> Thanks,
>> >>
>> >> Lina
>> >>
>> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
>> >> coheigea@apache.org> wrote:
>> >>
>> >>> Thanks Kalyan! I was thinking that if the cached privilege part does
>> not
>> >>> appear in the requested "part", and if is "all", then we should skip
>> that
>> >>> part and continue on to the next one. But maybe there is a better
>> >>> solution.
>> >>>
>> >>> Colm.
>> >>>
>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>> >>> kkalyan@cloudera.com> wrote:
>> >>>
>> >>> > Colm,
>> >>> >
>> >>> > I will look closer into this today and see If i can help you out.
>> >>> >
>> >>> > -Kalyan
>> >>> >
>> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>> >>> coheigea@apache.org>
>> >>> > wrote:
>> >>> >
>> >>> >> Hi,
>> >>> >>
>> >>> >> I've done some further analysis of the problem, and I think it is
>> not
>> >>> >> directly related to SENTRY-1291. The problem manifests in
>> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
>> >>> looks
>> >>> >> like:
>> >>> >>
>> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>> >>> >>
>> >>> >> The "privilege" I want to check looks like:
>> >>> >>
>> >>> >> Server=server1->Db=authz->Table=words->action=select;
>> >>> >>
>> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It
>> loops
>> >>> on
>> >>> >> the parts of the second privilege, and matches up to
>> "action=select".
>> >>> Here
>> >>> >> it tries to compare to "Column=*" of the cached privilege and
>> fails on
>> >>> >> this
>> >>> >> line:
>> >>> >>
>> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
>> >>> e.java#L86
>> >>> >>
>> >>> >> It's clear there's a bug here somewhere, but I'm not sure where -
>> can
>> >>> >> someone please advise?
>> >>> >>
>> >>> >> Thanks,
>> >>> >>
>> >>> >> Colm.
>> >>> >>
>> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com>
>> wrote:
>> >>> >>
>> >>> >> > Sasha,
>> >>> >> >
>> >>> >> > sentry-1291 is helpful for the problem that sentry privilege
>> checks
>> >>> >> takes
>> >>> >> > too long with many explicit grants, which is useful for big
>> >>> customers.
>> >>> >> > Another approach that can improve the performance is to organize
>> the
>> >>> >> > privileges according to the authorization hierarchy in a tree
>> >>> >> structure, so
>> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is
>> in
>> >>> the
>> >>> >> > order of log(N), not linear of N, where N is the number of
>> >>> privileges.
>> >>> >> >
>> >>> >> > We can wait for Colm to confirm his issue is caused by
>> sentry-1291.
>> >>> If
>> >>> >> so,
>> >>> >> > it may be fixed by selecting privileges by finding if the
>> requesting
>> >>> >> > authorization object is prefix of cached privileges instead of
>> exact
>> >>> >> match.
>> >>> >> >
>> >>> >> > in SimplePrivilegeCache
>> >>> >> >
>> >>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String>
>> >>> users,
>> >>> >> > ActiveRoleSet roleSet,
>> >>> >> >       Authorizable... authorizationHierarchy) {
>> >>> >> >     Set<String> privileges = new HashSet<>();
>> >>> >> >     Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier
>> >>> >> archy);
>> >>> >> >     for (StringBuilder authzKey : authzKeys) {
>> >>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) !=
>> null) {
>> >>> >> >   <-
>> >>> >> > instead of exact matching, add extension function to check if
>> >>> >> > authzKey.toString is the prefix of the key of the entries
>> >>> >> > in cachedAuthzPrivileges.
>> >>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>> >>> >> toString()));
>> >>> >> >       }
>> >>> >> >     }
>> >>> >> >
>> >>> >> >     return privileges;
>> >>> >> >   }
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> >
>> >>> >> > Lina
>> >>> >> >
>> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>> >>> akolb@cloudera.com
>> >>> >> >
>> >>> >> > wrote:
>> >>> >> >
>> >>> >> > > I think that SENTRY-1291 should be just reverted - there are
>> >>> multiple
>> >>> >> > > issues with it and no one is actually using the fix. Anyone
>> wants
>> >>> to
>> >>> >> do
>> >>> >> > it?
>> >>> >> > >
>> >>> >> > > - Alex
>> >>> >> > >
>> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <li...@cloudera.com>
>> >>> wrote:
>> >>> >> > >
>> >>> >> > > > Colm,
>> >>> >> > > >
>> >>> >> > > > Glad you find the cause!
>> >>> >> > > >
>> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is
>> >>> issue
>> >>> >> at
>> >>> >> > > > finding cached privileges.
>> >>> >> > > >
>> >>> >> > > > Cheers,
>> >>> >> > > >
>> >>> >> > > > Lina
>> >>> >> > > >
>> >>> >> > > > Sent from my iPhone
>> >>> >> > > >
>> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>> >>> >> > coheigea@apache.org>
>> >>> >> > > > wrote:
>> >>> >> > > > >
>> >>> >> > > > > Hi,
>> >>> >> > > > >
>> >>> >> > > > > I can see what the problem is (that the authorization
>> >>> hierarchy
>> >>> >> does
>> >>> >> > > not
>> >>> >> > > > > contain the column, and hence doesn't match against the
>> cached
>> >>> >> > > > privilege),
>> >>> >> > > > > but I'm not sure about the best way to solve it. Either the
>> >>> way we
>> >>> >> > are
>> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
>> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing
>> the
>> >>> >> cached
>> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>> >>> >> > CommonPrivilege).
>> >>> >> > > > >
>> >>> >> > > > > Colm.
>> >>> >> > > > >
>> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <
>> lina.li@cloudera.com
>> >>> >
>> >>> >> > wrote:
>> >>> >> > > > >>
>> >>> >> > > > >> Colm,
>> >>> >> > > > >>
>> >>> >> > > > >> I did not get chance to look into this issue today. Sorry
>> >>> about
>> >>> >> > that.
>> >>> >> > > > >>
>> >>> >> > > > >> You can add a e2e test case and set break point at where
>> the
>> >>> >> > > > authorization
>> >>> >> > > > >> object hierarchy to a list of authorization objects,
>> which is
>> >>> >> used
>> >>> >> > to
>> >>> >> > > do
>> >>> >> > > > >> exact match with cache
>> >>> >> > > > >>
>> >>> >> > > > >> Sent from my iPhone
>> >>> >> > > > >>
>> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>> >>> >> > > coheigea@apache.org
>> >>> >> > > > >
>> >>> >> > > > >> wrote:
>> >>> >> > > > >>>
>> >>> >> > > > >>> That would be great, thanks!
>> >>> >> > > > >>>
>> >>> >> > > > >>> Colm.
>> >>> >> > > > >>>
>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>> >>> lina.li@cloudera.com>
>> >>> >> > > wrote:
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> Colm,
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
>> >>> later
>> >>> >> > today.
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> Thanks,
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> Lina
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>> >>> >> > > > >> coheigea@apache.org>
>> >>> >> > > > >>>> wrote:
>> >>> >> > > > >>>>
>> >>> >> > > > >>>>> Hi all,
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry
>> >>> 2.0.0
>> >>> >> and
>> >>> >> > the
>> >>> >> > > > >> "v1"
>> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and
>> the
>> >>> "v2"
>> >>> >> > > > >> binding).
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING,
>> count
>> >>> >> INT). I
>> >>> >> > am
>> >>> >> > > > >>>> making
>> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM
>> words
>> >>> where
>> >>> >> > > count
>> >>> >> > > > ==
>> >>> >> > > > >>>>> '100'".
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the
>> following
>> >>> >> role:
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> select_all_role =
>> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>> >>> e=words->Column=*->action=sele
>> >>> >> ct
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Essentially, authorization is denied even though the
>> >>> policy is
>> >>> >> > > > correct.
>> >>> >> > > > >>>> If
>> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached
>> privilege
>> >>> is:
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> server=server1->db=authz->tabl
>> e=words->column=*=[Server=
>> >>> >> > > > >>>>> server1->Db=authz->Table=words
>> ->Column=*->action=select]
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the
>> authorizable
>> >>> >> > > hierarchy
>> >>> >> > > > >>>> looks
>> >>> >> > > > >>>>> like:
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Server [name=server1]
>> >>> >> > > > >>>>> Database [name=authz]
>> >>> >> > > > >>>>> Table [name=words]
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> There is no "column" here, and a match is not made
>> >>> against the
>> >>> >> > > cached
>> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing
>> some
>> >>> >> > > > configuration
>> >>> >> > > > >>>>> switch?
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Colm.
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> --
>> >>> >> > > > >>>>> Colm O hEigeartaigh
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Talend Community Coder
>> >>> >> > > > >>>>> http://coders.talend.com
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>
>> >>> >> > > > >>>
>> >>> >> > > > >>>
>> >>> >> > > > >>>
>> >>> >> > > > >>> --
>> >>> >> > > > >>> Colm O hEigeartaigh
>> >>> >> > > > >>>
>> >>> >> > > > >>> Talend Community Coder
>> >>> >> > > > >>> http://coders.talend.com
>> >>> >> > > > >>
>> >>> >> > > > >
>> >>> >> > > > >
>> >>> >> > > > >
>> >>> >> > > > > --
>> >>> >> > > > > Colm O hEigeartaigh
>> >>> >> > > > >
>> >>> >> > > > > Talend Community Coder
>> >>> >> > > > > http://coders.talend.com
>> >>> >> > > >
>> >>> >> > >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Colm O hEigeartaigh
>> >>> >>
>> >>> >> Talend Community Coder
>> >>> >> http://coders.talend.com
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Colm O hEigeartaigh
>> >>>
>> >>> Talend Community Coder
>> >>> http://coders.talend.com
>> >>>
>> >>
>> >>
>> >
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Issue with SimpleCacheProviderBackend

Posted by Na Li <li...@cloudera.com>.
Colm,

Glad I can help. Do you know what configuration caused the columns not
parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2()
== false?

Thanks,

Lina

On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <co...@apache.org>
wrote:

> Hi Lina,
>
> Thanks a lot for your help on this! I was able to get the test to work by
> adding the following config option:
>
> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true");
>
> Colm.
>
> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <li...@cloudera.com> wrote:
>
> > Colm,
> >
> > The following code shows where Hive sets the column info. You can debug
> > into hive code and see why AccessedColumns is not set.
> >
> > The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
> >
> >               boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
> && HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
> >         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
> >           ColumnAccessAnalyzer columnAccessAnalyzer = new
> ColumnAccessAnalyzer(pCtx);
> >           this.setColumnAccessInfo(columnAccessAnalyzer.
> analyzeColumnAccess(this.getColumnAccessInfo()));
> >         }
> >
> >           this.LOG.info("Completed plan generation");
> >         if (HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS))
> {
> >           this.putAccessedColumnsToReadEntity(this.inputs,
> this.columnAccessInfo);
> >         }
> >
> >
> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <li...@cloudera.com> wrote:
> >
> >> Colm,
> >>
> >> I tried to reproduce your issue using sentry 2.0 (master branch) with
> >> Hive 2.3.2.
> >>
> >> The test code is
> >>
> >>   @Test
> >>   public void testPositiveOnAll() throws Exception {
> >>     Connection connection = context.createConnection(ADMIN1);
> >>     Statement statement = context.createStatement(connection);
> >>     statement.execute("CREATE database " + DB1);
> >>     statement.execute("use " + DB1);
> >>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
> >>     statement.execute("CREATE ROLE user_role1");
> >>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*");
> >>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
> >>     statement.close();
> >>     connection.close();
> >>
> >>     connection = context.createConnection(USER1_1);
> >>     statement = context.createStatement(connection);
> >>     statement.execute("use " + DB1);
> >>     statement.execute("*SELECT * FROM t1*");
> >>
> >>     statement.close();
> >>     connection.close();
> >>   }
> >>
> >>
> >> required privileges:
> >>
> >>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
> >>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
> >>
> >>
> >> cached privilege:
> >>
> >>    - server=server1->db=db_1->table=t1->action=select
> >>
> >> So the authorization works.
> >>
> >> Note
> >>
> >>    - For me, the "*SELECT * FROM t1*" causes the required privileges to
> >>    contain each column explicitly. However, for you, The "privilege" to
> check
> >>    looks like:
> >>    Server=server1->Db=authz->Table=words->action=select; The columns
> are
> >>    not explicitly listed. Hive controls if the column is included in
> >>    required privilege. At org.apache.sentry.binding.h
> >>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
> >>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
> >>    accessedColumns from Hive input to add colHierarchy for each column.
> >>    You can check if accessedColumns is empty or null for the hive
> >>    version you are using.
> >>    - For me, the cached privilege does not include column part. For you,
> >>    the cached privilege is "Server=server1->Db=authz->Table=words->
> >>    *Column=**->action=select". *Can you share your test code*, so I can
> >>    see how you grant the privilege and therefore the cached privilege
> contains
> >>    column?
> >>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
> >>       user_role1", and got following error
> >>       -
> >>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
> >>       [WARN - org.apache.hive.service.cli.th
> >>       rift.ThriftCLIService.ExecuteStatement(
> ThriftCLIService.java:539)]
> >>       Error executing statement:
> >>       - org.apache.hive.service.cli.HiveSQLException: Error while
> >>       compiling statement: FAILED: ParseException line 1:6 cannot
> recognize input
> >>       near 'GRANT' 'SELECT' '(' in ddl statement
> >>       - at org.apache.hive.service.cli.operation.Operation.toSQLExcepti
> >>       on(Operation.java:380)
> >>       - at org.apache.hive.service.cli.operation.SQLOperation.prepare(
> >>       SQLOperation.java:206)
> >>       - at org.apache.hive.service.cli.operation.SQLOperation.runIntern
> >>       al(SQLOperation.java:290)
> >>       - at org.apache.hive.service.cli.operation.Operation.run(Operatio
> >>       n.java:320)
> >>       - at org.apache.hive.service.cli.session.HiveSessionImpl.executeS
> >>       tatementInternal(HiveSessionImpl.java:530)
> >>
> >> Thanks,
> >>
> >> Lina
> >>
> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
> >> coheigea@apache.org> wrote:
> >>
> >>> Thanks Kalyan! I was thinking that if the cached privilege part does
> not
> >>> appear in the requested "part", and if is "all", then we should skip
> that
> >>> part and continue on to the next one. But maybe there is a better
> >>> solution.
> >>>
> >>> Colm.
> >>>
> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
> >>> kkalyan@cloudera.com> wrote:
> >>>
> >>> > Colm,
> >>> >
> >>> > I will look closer into this today and see If i can help you out.
> >>> >
> >>> > -Kalyan
> >>> >
> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
> >>> coheigea@apache.org>
> >>> > wrote:
> >>> >
> >>> >> Hi,
> >>> >>
> >>> >> I've done some further analysis of the problem, and I think it is
> not
> >>> >> directly related to SENTRY-1291. The problem manifests in
> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
> >>> looks
> >>> >> like:
> >>> >>
> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
> >>> >>
> >>> >> The "privilege" I want to check looks like:
> >>> >>
> >>> >> Server=server1->Db=authz->Table=words->action=select;
> >>> >>
> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It
> loops
> >>> on
> >>> >> the parts of the second privilege, and matches up to
> "action=select".
> >>> Here
> >>> >> it tries to compare to "Column=*" of the cached privilege and fails
> on
> >>> >> this
> >>> >> line:
> >>> >>
> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
> >>> e.java#L86
> >>> >>
> >>> >> It's clear there's a bug here somewhere, but I'm not sure where -
> can
> >>> >> someone please advise?
> >>> >>
> >>> >> Thanks,
> >>> >>
> >>> >> Colm.
> >>> >>
> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com>
> wrote:
> >>> >>
> >>> >> > Sasha,
> >>> >> >
> >>> >> > sentry-1291 is helpful for the problem that sentry privilege
> checks
> >>> >> takes
> >>> >> > too long with many explicit grants, which is useful for big
> >>> customers.
> >>> >> > Another approach that can improve the performance is to organize
> the
> >>> >> > privileges according to the authorization hierarchy in a tree
> >>> >> structure, so
> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is
> in
> >>> the
> >>> >> > order of log(N), not linear of N, where N is the number of
> >>> privileges.
> >>> >> >
> >>> >> > We can wait for Colm to confirm his issue is caused by
> sentry-1291.
> >>> If
> >>> >> so,
> >>> >> > it may be fixed by selecting privileges by finding if the
> requesting
> >>> >> > authorization object is prefix of cached privileges instead of
> exact
> >>> >> match.
> >>> >> >
> >>> >> > in SimplePrivilegeCache
> >>> >> >
> >>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String>
> >>> users,
> >>> >> > ActiveRoleSet roleSet,
> >>> >> >       Authorizable... authorizationHierarchy) {
> >>> >> >     Set<String> privileges = new HashSet<>();
> >>> >> >     Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier
> >>> >> archy);
> >>> >> >     for (StringBuilder authzKey : authzKeys) {
> >>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) !=
> null) {
> >>> >> >   <-
> >>> >> > instead of exact matching, add extension function to check if
> >>> >> > authzKey.toString is the prefix of the key of the entries
> >>> >> > in cachedAuthzPrivileges.
> >>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
> >>> >> toString()));
> >>> >> >       }
> >>> >> >     }
> >>> >> >
> >>> >> >     return privileges;
> >>> >> >   }
> >>> >> >
> >>> >> > Thanks,
> >>> >> >
> >>> >> > Lina
> >>> >> >
> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
> >>> akolb@cloudera.com
> >>> >> >
> >>> >> > wrote:
> >>> >> >
> >>> >> > > I think that SENTRY-1291 should be just reverted - there are
> >>> multiple
> >>> >> > > issues with it and no one is actually using the fix. Anyone
> wants
> >>> to
> >>> >> do
> >>> >> > it?
> >>> >> > >
> >>> >> > > - Alex
> >>> >> > >
> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <li...@cloudera.com>
> >>> wrote:
> >>> >> > >
> >>> >> > > > Colm,
> >>> >> > > >
> >>> >> > > > Glad you find the cause!
> >>> >> > > >
> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is
> >>> issue
> >>> >> at
> >>> >> > > > finding cached privileges.
> >>> >> > > >
> >>> >> > > > Cheers,
> >>> >> > > >
> >>> >> > > > Lina
> >>> >> > > >
> >>> >> > > > Sent from my iPhone
> >>> >> > > >
> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
> >>> >> > coheigea@apache.org>
> >>> >> > > > wrote:
> >>> >> > > > >
> >>> >> > > > > Hi,
> >>> >> > > > >
> >>> >> > > > > I can see what the problem is (that the authorization
> >>> hierarchy
> >>> >> does
> >>> >> > > not
> >>> >> > > > > contain the column, and hence doesn't match against the
> cached
> >>> >> > > > privilege),
> >>> >> > > > > but I'm not sure about the best way to solve it. Either the
> >>> way we
> >>> >> > are
> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing the
> >>> >> cached
> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
> >>> >> > CommonPrivilege).
> >>> >> > > > >
> >>> >> > > > > Colm.
> >>> >> > > > >
> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <
> lina.li@cloudera.com
> >>> >
> >>> >> > wrote:
> >>> >> > > > >>
> >>> >> > > > >> Colm,
> >>> >> > > > >>
> >>> >> > > > >> I did not get chance to look into this issue today. Sorry
> >>> about
> >>> >> > that.
> >>> >> > > > >>
> >>> >> > > > >> You can add a e2e test case and set break point at where
> the
> >>> >> > > > authorization
> >>> >> > > > >> object hierarchy to a list of authorization objects, which
> is
> >>> >> used
> >>> >> > to
> >>> >> > > do
> >>> >> > > > >> exact match with cache
> >>> >> > > > >>
> >>> >> > > > >> Sent from my iPhone
> >>> >> > > > >>
> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
> >>> >> > > coheigea@apache.org
> >>> >> > > > >
> >>> >> > > > >> wrote:
> >>> >> > > > >>>
> >>> >> > > > >>> That would be great, thanks!
> >>> >> > > > >>>
> >>> >> > > > >>> Colm.
> >>> >> > > > >>>
> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
> >>> lina.li@cloudera.com>
> >>> >> > > wrote:
> >>> >> > > > >>>>
> >>> >> > > > >>>> Colm,
> >>> >> > > > >>>>
> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
> >>> later
> >>> >> > today.
> >>> >> > > > >>>>
> >>> >> > > > >>>> Thanks,
> >>> >> > > > >>>>
> >>> >> > > > >>>> Lina
> >>> >> > > > >>>>
> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
> >>> >> > > > >> coheigea@apache.org>
> >>> >> > > > >>>> wrote:
> >>> >> > > > >>>>
> >>> >> > > > >>>>> Hi all,
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry
> >>> 2.0.0
> >>> >> and
> >>> >> > the
> >>> >> > > > >> "v1"
> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and
> the
> >>> "v2"
> >>> >> > > > >> binding).
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> I have a simple table called "words" (word STRING, count
> >>> >> INT). I
> >>> >> > am
> >>> >> > > > >>>> making
> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM words
> >>> where
> >>> >> > > count
> >>> >> > > > ==
> >>> >> > > > >>>>> '100'".
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the
> following
> >>> >> role:
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> select_all_role =
> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl
> >>> e=words->Column=*->action=sele
> >>> >> ct
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> Essentially, authorization is denied even though the
> >>> policy is
> >>> >> > > > correct.
> >>> >> > > > >>>> If
> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached privilege
> >>> is:
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> server=server1->db=authz->
> table=words->column=*=[Server=
> >>> >> > > > >>>>> server1->Db=authz->Table=words->Column=*->action=
> select]
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> However, when "listPrivileges" is called, the
> authorizable
> >>> >> > > hierarchy
> >>> >> > > > >>>> looks
> >>> >> > > > >>>>> like:
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> Server [name=server1]
> >>> >> > > > >>>>> Database [name=authz]
> >>> >> > > > >>>>> Table [name=words]
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> There is no "column" here, and a match is not made
> >>> against the
> >>> >> > > cached
> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing
> some
> >>> >> > > > configuration
> >>> >> > > > >>>>> switch?
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> Colm.
> >>> >> > > > >>>>>
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> --
> >>> >> > > > >>>>> Colm O hEigeartaigh
> >>> >> > > > >>>>>
> >>> >> > > > >>>>> Talend Community Coder
> >>> >> > > > >>>>> http://coders.talend.com
> >>> >> > > > >>>>>
> >>> >> > > > >>>>
> >>> >> > > > >>>
> >>> >> > > > >>>
> >>> >> > > > >>>
> >>> >> > > > >>> --
> >>> >> > > > >>> Colm O hEigeartaigh
> >>> >> > > > >>>
> >>> >> > > > >>> Talend Community Coder
> >>> >> > > > >>> http://coders.talend.com
> >>> >> > > > >>
> >>> >> > > > >
> >>> >> > > > >
> >>> >> > > > >
> >>> >> > > > > --
> >>> >> > > > > Colm O hEigeartaigh
> >>> >> > > > >
> >>> >> > > > > Talend Community Coder
> >>> >> > > > > http://coders.talend.com
> >>> >> > > >
> >>> >> > >
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Colm O hEigeartaigh
> >>> >>
> >>> >> Talend Community Coder
> >>> >> http://coders.talend.com
> >>> >>
> >>> >
> >>> >
> >>>
> >>>
> >>> --
> >>> Colm O hEigeartaigh
> >>>
> >>> Talend Community Coder
> >>> http://coders.talend.com
> >>>
> >>
> >>
> >
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>

Re: Issue with SimpleCacheProviderBackend

Posted by Colm O hEigeartaigh <co...@apache.org>.
Hi Lina,

Thanks a lot for your help on this! I was able to get the test to work by
adding the following config option:

conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true");

Colm.

On Thu, Jan 4, 2018 at 10:06 PM, Na Li <li...@cloudera.com> wrote:

> Colm,
>
> The following code shows where Hive sets the column info. You can debug
> into hive code and see why AccessedColumns is not set.
>
> The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
>
> 		boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() && HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
>         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>           ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx);
>           this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess(this.getColumnAccessInfo()));
>         }
> 		
> 	    this.LOG.info("Completed plan generation");
>         if (HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>           this.putAccessedColumnsToReadEntity(this.inputs, this.columnAccessInfo);
>         }
>
>
> On Wed, Jan 3, 2018 at 11:28 PM, Na Li <li...@cloudera.com> wrote:
>
>> Colm,
>>
>> I tried to reproduce your issue using sentry 2.0 (master branch) with
>> Hive 2.3.2.
>>
>> The test code is
>>
>>   @Test
>>   public void testPositiveOnAll() throws Exception {
>>     Connection connection = context.createConnection(ADMIN1);
>>     Statement statement = context.createStatement(connection);
>>     statement.execute("CREATE database " + DB1);
>>     statement.execute("use " + DB1);
>>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>>     statement.execute("CREATE ROLE user_role1");
>>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*");
>>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
>>     statement.close();
>>     connection.close();
>>
>>     connection = context.createConnection(USER1_1);
>>     statement = context.createStatement(connection);
>>     statement.execute("use " + DB1);
>>     statement.execute("*SELECT * FROM t1*");
>>
>>     statement.close();
>>     connection.close();
>>   }
>>
>>
>> required privileges:
>>
>>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>>
>>
>> cached privilege:
>>
>>    - server=server1->db=db_1->table=t1->action=select
>>
>> So the authorization works.
>>
>> Note
>>
>>    - For me, the "*SELECT * FROM t1*" causes the required privileges to
>>    contain each column explicitly. However, for you, The "privilege" to check
>>    looks like:
>>    Server=server1->Db=authz->Table=words->action=select; The columns are
>>    not explicitly listed. Hive controls if the column is included in
>>    required privilege. At org.apache.sentry.binding.h
>>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>>    accessedColumns from Hive input to add colHierarchy for each column.
>>    You can check if accessedColumns is empty or null for the hive
>>    version you are using.
>>    - For me, the cached privilege does not include column part. For you,
>>    the cached privilege is "Server=server1->Db=authz->Table=words->
>>    *Column=**->action=select". *Can you share your test code*, so I can
>>    see how you grant the privilege and therefore the cached privilege contains
>>    column?
>>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>>       user_role1", and got following error
>>       -
>>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
>>       [WARN - org.apache.hive.service.cli.th
>>       rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:539)]
>>       Error executing statement:
>>       - org.apache.hive.service.cli.HiveSQLException: Error while
>>       compiling statement: FAILED: ParseException line 1:6 cannot recognize input
>>       near 'GRANT' 'SELECT' '(' in ddl statement
>>       - at org.apache.hive.service.cli.operation.Operation.toSQLExcepti
>>       on(Operation.java:380)
>>       - at org.apache.hive.service.cli.operation.SQLOperation.prepare(
>>       SQLOperation.java:206)
>>       - at org.apache.hive.service.cli.operation.SQLOperation.runIntern
>>       al(SQLOperation.java:290)
>>       - at org.apache.hive.service.cli.operation.Operation.run(Operatio
>>       n.java:320)
>>       - at org.apache.hive.service.cli.session.HiveSessionImpl.executeS
>>       tatementInternal(HiveSessionImpl.java:530)
>>
>> Thanks,
>>
>> Lina
>>
>> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
>> coheigea@apache.org> wrote:
>>
>>> Thanks Kalyan! I was thinking that if the cached privilege part does not
>>> appear in the requested "part", and if is "all", then we should skip that
>>> part and continue on to the next one. But maybe there is a better
>>> solution.
>>>
>>> Colm.
>>>
>>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>>> kkalyan@cloudera.com> wrote:
>>>
>>> > Colm,
>>> >
>>> > I will look closer into this today and see If i can help you out.
>>> >
>>> > -Kalyan
>>> >
>>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>>> coheigea@apache.org>
>>> > wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I've done some further analysis of the problem, and I think it is not
>>> >> directly related to SENTRY-1291. The problem manifests in
>>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
>>> looks
>>> >> like:
>>> >>
>>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>>> >>
>>> >> The "privilege" I want to check looks like:
>>> >>
>>> >> Server=server1->Db=authz->Table=words->action=select;
>>> >>
>>> >> The problem is in the "for" loop in CommonPrivilege.implies. It loops
>>> on
>>> >> the parts of the second privilege, and matches up to "action=select".
>>> Here
>>> >> it tries to compare to "Column=*" of the cached privilege and fails on
>>> >> this
>>> >> line:
>>> >>
>>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
>>> e.java#L86
>>> >>
>>> >> It's clear there's a bug here somewhere, but I'm not sure where - can
>>> >> someone please advise?
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Colm.
>>> >>
>>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com> wrote:
>>> >>
>>> >> > Sasha,
>>> >> >
>>> >> > sentry-1291 is helpful for the problem that sentry privilege checks
>>> >> takes
>>> >> > too long with many explicit grants, which is useful for big
>>> customers.
>>> >> > Another approach that can improve the performance is to organize the
>>> >> > privileges according to the authorization hierarchy in a tree
>>> >> structure, so
>>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is in
>>> the
>>> >> > order of log(N), not linear of N, where N is the number of
>>> privileges.
>>> >> >
>>> >> > We can wait for Colm to confirm his issue is caused by sentry-1291.
>>> If
>>> >> so,
>>> >> > it may be fixed by selecting privileges by finding if the requesting
>>> >> > authorization object is prefix of cached privileges instead of exact
>>> >> match.
>>> >> >
>>> >> > in SimplePrivilegeCache
>>> >> >
>>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String>
>>> users,
>>> >> > ActiveRoleSet roleSet,
>>> >> >       Authorizable... authorizationHierarchy) {
>>> >> >     Set<String> privileges = new HashSet<>();
>>> >> >     Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier
>>> >> archy);
>>> >> >     for (StringBuilder authzKey : authzKeys) {
>>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) != null) {
>>> >> >   <-
>>> >> > instead of exact matching, add extension function to check if
>>> >> > authzKey.toString is the prefix of the key of the entries
>>> >> > in cachedAuthzPrivileges.
>>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>>> >> toString()));
>>> >> >       }
>>> >> >     }
>>> >> >
>>> >> >     return privileges;
>>> >> >   }
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Lina
>>> >> >
>>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>>> akolb@cloudera.com
>>> >> >
>>> >> > wrote:
>>> >> >
>>> >> > > I think that SENTRY-1291 should be just reverted - there are
>>> multiple
>>> >> > > issues with it and no one is actually using the fix. Anyone wants
>>> to
>>> >> do
>>> >> > it?
>>> >> > >
>>> >> > > - Alex
>>> >> > >
>>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <li...@cloudera.com>
>>> wrote:
>>> >> > >
>>> >> > > > Colm,
>>> >> > > >
>>> >> > > > Glad you find the cause!
>>> >> > > >
>>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is
>>> issue
>>> >> at
>>> >> > > > finding cached privileges.
>>> >> > > >
>>> >> > > > Cheers,
>>> >> > > >
>>> >> > > > Lina
>>> >> > > >
>>> >> > > > Sent from my iPhone
>>> >> > > >
>>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>>> >> > coheigea@apache.org>
>>> >> > > > wrote:
>>> >> > > > >
>>> >> > > > > Hi,
>>> >> > > > >
>>> >> > > > > I can see what the problem is (that the authorization
>>> hierarchy
>>> >> does
>>> >> > > not
>>> >> > > > > contain the column, and hence doesn't match against the cached
>>> >> > > > privilege),
>>> >> > > > > but I'm not sure about the best way to solve it. Either the
>>> way we
>>> >> > are
>>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
>>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing the
>>> >> cached
>>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>>> >> > CommonPrivilege).
>>> >> > > > >
>>> >> > > > > Colm.
>>> >> > > > >
>>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <lina.li@cloudera.com
>>> >
>>> >> > wrote:
>>> >> > > > >>
>>> >> > > > >> Colm,
>>> >> > > > >>
>>> >> > > > >> I did not get chance to look into this issue today. Sorry
>>> about
>>> >> > that.
>>> >> > > > >>
>>> >> > > > >> You can add a e2e test case and set break point at where the
>>> >> > > > authorization
>>> >> > > > >> object hierarchy to a list of authorization objects, which is
>>> >> used
>>> >> > to
>>> >> > > do
>>> >> > > > >> exact match with cache
>>> >> > > > >>
>>> >> > > > >> Sent from my iPhone
>>> >> > > > >>
>>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>>> >> > > coheigea@apache.org
>>> >> > > > >
>>> >> > > > >> wrote:
>>> >> > > > >>>
>>> >> > > > >>> That would be great, thanks!
>>> >> > > > >>>
>>> >> > > > >>> Colm.
>>> >> > > > >>>
>>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>>> lina.li@cloudera.com>
>>> >> > > wrote:
>>> >> > > > >>>>
>>> >> > > > >>>> Colm,
>>> >> > > > >>>>
>>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
>>> later
>>> >> > today.
>>> >> > > > >>>>
>>> >> > > > >>>> Thanks,
>>> >> > > > >>>>
>>> >> > > > >>>> Lina
>>> >> > > > >>>>
>>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>>> >> > > > >> coheigea@apache.org>
>>> >> > > > >>>> wrote:
>>> >> > > > >>>>
>>> >> > > > >>>>> Hi all,
>>> >> > > > >>>>>
>>> >> > > > >>>>> I've updated some local testcases to work with Sentry
>>> 2.0.0
>>> >> and
>>> >> > the
>>> >> > > > >> "v1"
>>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and the
>>> "v2"
>>> >> > > > >> binding).
>>> >> > > > >>>>>
>>> >> > > > >>>>> I have a simple table called "words" (word STRING, count
>>> >> INT). I
>>> >> > am
>>> >> > > > >>>> making
>>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM words
>>> where
>>> >> > > count
>>> >> > > > ==
>>> >> > > > >>>>> '100'".
>>> >> > > > >>>>>
>>> >> > > > >>>>> "bob" is in the "manager" group", which has the following
>>> >> role:
>>> >> > > > >>>>>
>>> >> > > > >>>>> select_all_role =
>>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>>> e=words->Column=*->action=sele
>>> >> ct
>>> >> > > > >>>>>
>>> >> > > > >>>>> Essentially, authorization is denied even though the
>>> policy is
>>> >> > > > correct.
>>> >> > > > >>>> If
>>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached privilege
>>> is:
>>> >> > > > >>>>>
>>> >> > > > >>>>> server=server1->db=authz->table=words->column=*=[Server=
>>> >> > > > >>>>> server1->Db=authz->Table=words->Column=*->action=select]
>>> >> > > > >>>>>
>>> >> > > > >>>>> However, when "listPrivileges" is called, the authorizable
>>> >> > > hierarchy
>>> >> > > > >>>> looks
>>> >> > > > >>>>> like:
>>> >> > > > >>>>>
>>> >> > > > >>>>> Server [name=server1]
>>> >> > > > >>>>> Database [name=authz]
>>> >> > > > >>>>> Table [name=words]
>>> >> > > > >>>>>
>>> >> > > > >>>>> There is no "column" here, and a match is not made
>>> against the
>>> >> > > cached
>>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing some
>>> >> > > > configuration
>>> >> > > > >>>>> switch?
>>> >> > > > >>>>>
>>> >> > > > >>>>> Colm.
>>> >> > > > >>>>>
>>> >> > > > >>>>>
>>> >> > > > >>>>> --
>>> >> > > > >>>>> Colm O hEigeartaigh
>>> >> > > > >>>>>
>>> >> > > > >>>>> Talend Community Coder
>>> >> > > > >>>>> http://coders.talend.com
>>> >> > > > >>>>>
>>> >> > > > >>>>
>>> >> > > > >>>
>>> >> > > > >>>
>>> >> > > > >>>
>>> >> > > > >>> --
>>> >> > > > >>> Colm O hEigeartaigh
>>> >> > > > >>>
>>> >> > > > >>> Talend Community Coder
>>> >> > > > >>> http://coders.talend.com
>>> >> > > > >>
>>> >> > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > --
>>> >> > > > > Colm O hEigeartaigh
>>> >> > > > >
>>> >> > > > > Talend Community Coder
>>> >> > > > > http://coders.talend.com
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Colm O hEigeartaigh
>>> >>
>>> >> Talend Community Coder
>>> >> http://coders.talend.com
>>> >>
>>> >
>>> >
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Issue with SimpleCacheProviderBackend

Posted by Na Li <li...@cloudera.com>.
Colm,

The following code shows where Hive sets the column info. You can debug
into hive code and see why AccessedColumns is not set.

The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer

		boolean isColumnInfoNeedForAuth =
SessionState.get().isAuthorizationModeV2() &&
HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
        if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
          ColumnAccessAnalyzer columnAccessAnalyzer = new
ColumnAccessAnalyzer(pCtx);
          this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess(this.getColumnAccessInfo()));
        }
		
	    this.LOG.info("Completed plan generation");
        if (HiveConf.getBoolVar(this.conf,
ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
          this.putAccessedColumnsToReadEntity(this.inputs,
this.columnAccessInfo);
        }


On Wed, Jan 3, 2018 at 11:28 PM, Na Li <li...@cloudera.com> wrote:

> Colm,
>
> I tried to reproduce your issue using sentry 2.0 (master branch) with Hive
> 2.3.2.
>
> The test code is
>
>   @Test
>   public void testPositiveOnAll() throws Exception {
>     Connection connection = context.createConnection(ADMIN1);
>     Statement statement = context.createStatement(connection);
>     statement.execute("CREATE database " + DB1);
>     statement.execute("use " + DB1);
>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>     statement.execute("CREATE ROLE user_role1");
>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*");
>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
>     statement.close();
>     connection.close();
>
>     connection = context.createConnection(USER1_1);
>     statement = context.createStatement(connection);
>     statement.execute("use " + DB1);
>     statement.execute("*SELECT * FROM t1*");
>
>     statement.close();
>     connection.close();
>   }
>
>
> required privileges:
>
>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>
>
> cached privilege:
>
>    - server=server1->db=db_1->table=t1->action=select
>
> So the authorization works.
>
> Note
>
>    - For me, the "*SELECT * FROM t1*" causes the required privileges to
>    contain each column explicitly. However, for you, The "privilege" to check
>    looks like:
>    Server=server1->Db=authz->Table=words->action=select; The columns are
>    not explicitly listed. Hive controls if the column is included in
>    required privilege. At org.apache.sentry.binding.hive.authz.
>    HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>    accessedColumns from Hive input to add colHierarchy for each column.
>    You can check if accessedColumns is empty or null for the hive version
>    you are using.
>    - For me, the cached privilege does not include column part. For you,
>    the cached privilege is "Server=server1->Db=authz->Table=words->
>    *Column=**->action=select". *Can you share your test code*, so I can
>    see how you grant the privilege and therefore the cached privilege contains
>    column?
>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>       user_role1", and got following error
>       -
>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
>       [WARN - org.apache.hive.service.cli.thrift.ThriftCLIService.
>       ExecuteStatement(ThriftCLIService.java:539)] Error executing
>       statement:
>       - org.apache.hive.service.cli.HiveSQLException: Error while
>       compiling statement: FAILED: ParseException line 1:6 cannot recognize input
>       near 'GRANT' 'SELECT' '(' in ddl statement
>       - at org.apache.hive.service.cli.operation.Operation.
>       toSQLException(Operation.java:380)
>       - at org.apache.hive.service.cli.operation.SQLOperation.
>       prepare(SQLOperation.java:206)
>       - at org.apache.hive.service.cli.operation.SQLOperation.
>       runInternal(SQLOperation.java:290)
>       - at org.apache.hive.service.cli.operation.Operation.run(
>       Operation.java:320)
>       - at org.apache.hive.service.cli.session.HiveSessionImpl.
>       executeStatementInternal(HiveSessionImpl.java:530)
>
> Thanks,
>
> Lina
>
> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <coheigea@apache.org
> > wrote:
>
>> Thanks Kalyan! I was thinking that if the cached privilege part does not
>> appear in the requested "part", and if is "all", then we should skip that
>> part and continue on to the next one. But maybe there is a better
>> solution.
>>
>> Colm.
>>
>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>> kkalyan@cloudera.com> wrote:
>>
>> > Colm,
>> >
>> > I will look closer into this today and see If i can help you out.
>> >
>> > -Kalyan
>> >
>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>> coheigea@apache.org>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I've done some further analysis of the problem, and I think it is not
>> >> directly related to SENTRY-1291. The problem manifests in
>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege looks
>> >> like:
>> >>
>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>> >>
>> >> The "privilege" I want to check looks like:
>> >>
>> >> Server=server1->Db=authz->Table=words->action=select;
>> >>
>> >> The problem is in the "for" loop in CommonPrivilege.implies. It loops
>> on
>> >> the parts of the second privilege, and matches up to "action=select".
>> Here
>> >> it tries to compare to "Column=*" of the cached privilege and fails on
>> >> this
>> >> line:
>> >>
>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivilege.java#L86
>> >>
>> >> It's clear there's a bug here somewhere, but I'm not sure where - can
>> >> someone please advise?
>> >>
>> >> Thanks,
>> >>
>> >> Colm.
>> >>
>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <li...@cloudera.com> wrote:
>> >>
>> >> > Sasha,
>> >> >
>> >> > sentry-1291 is helpful for the problem that sentry privilege checks
>> >> takes
>> >> > too long with many explicit grants, which is useful for big
>> customers.
>> >> > Another approach that can improve the performance is to organize the
>> >> > privileges according to the authorization hierarchy in a tree
>> >> structure, so
>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is in
>> the
>> >> > order of log(N), not linear of N, where N is the number of
>> privileges.
>> >> >
>> >> > We can wait for Colm to confirm his issue is caused by sentry-1291.
>> If
>> >> so,
>> >> > it may be fixed by selecting privileges by finding if the requesting
>> >> > authorization object is prefix of cached privileges instead of exact
>> >> match.
>> >> >
>> >> > in SimplePrivilegeCache
>> >> >
>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String>
>> users,
>> >> > ActiveRoleSet roleSet,
>> >> >       Authorizable... authorizationHierarchy) {
>> >> >     Set<String> privileges = new HashSet<>();
>> >> >     Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier
>> >> archy);
>> >> >     for (StringBuilder authzKey : authzKeys) {
>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) != null) {
>> >> >   <-
>> >> > instead of exact matching, add extension function to check if
>> >> > authzKey.toString is the prefix of the key of the entries
>> >> > in cachedAuthzPrivileges.
>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>> >> toString()));
>> >> >       }
>> >> >     }
>> >> >
>> >> >     return privileges;
>> >> >   }
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Lina
>> >> >
>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>> akolb@cloudera.com
>> >> >
>> >> > wrote:
>> >> >
>> >> > > I think that SENTRY-1291 should be just reverted - there are
>> multiple
>> >> > > issues with it and no one is actually using the fix. Anyone wants
>> to
>> >> do
>> >> > it?
>> >> > >
>> >> > > - Alex
>> >> > >
>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <li...@cloudera.com>
>> wrote:
>> >> > >
>> >> > > > Colm,
>> >> > > >
>> >> > > > Glad you find the cause!
>> >> > > >
>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is
>> issue
>> >> at
>> >> > > > finding cached privileges.
>> >> > > >
>> >> > > > Cheers,
>> >> > > >
>> >> > > > Lina
>> >> > > >
>> >> > > > Sent from my iPhone
>> >> > > >
>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>> >> > coheigea@apache.org>
>> >> > > > wrote:
>> >> > > > >
>> >> > > > > Hi,
>> >> > > > >
>> >> > > > > I can see what the problem is (that the authorization hierarchy
>> >> does
>> >> > > not
>> >> > > > > contain the column, and hence doesn't match against the cached
>> >> > > > privilege),
>> >> > > > > but I'm not sure about the best way to solve it. Either the
>> way we
>> >> > are
>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing the
>> >> cached
>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>> >> > CommonPrivilege).
>> >> > > > >
>> >> > > > > Colm.
>> >> > > > >
>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <li...@cloudera.com>
>> >> > wrote:
>> >> > > > >>
>> >> > > > >> Colm,
>> >> > > > >>
>> >> > > > >> I did not get chance to look into this issue today. Sorry
>> about
>> >> > that.
>> >> > > > >>
>> >> > > > >> You can add a e2e test case and set break point at where the
>> >> > > > authorization
>> >> > > > >> object hierarchy to a list of authorization objects, which is
>> >> used
>> >> > to
>> >> > > do
>> >> > > > >> exact match with cache
>> >> > > > >>
>> >> > > > >> Sent from my iPhone
>> >> > > > >>
>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>> >> > > coheigea@apache.org
>> >> > > > >
>> >> > > > >> wrote:
>> >> > > > >>>
>> >> > > > >>> That would be great, thanks!
>> >> > > > >>>
>> >> > > > >>> Colm.
>> >> > > > >>>
>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>> lina.li@cloudera.com>
>> >> > > wrote:
>> >> > > > >>>>
>> >> > > > >>>> Colm,
>> >> > > > >>>>
>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
>> later
>> >> > today.
>> >> > > > >>>>
>> >> > > > >>>> Thanks,
>> >> > > > >>>>
>> >> > > > >>>> Lina
>> >> > > > >>>>
>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>> >> > > > >> coheigea@apache.org>
>> >> > > > >>>> wrote:
>> >> > > > >>>>
>> >> > > > >>>>> Hi all,
>> >> > > > >>>>>
>> >> > > > >>>>> I've updated some local testcases to work with Sentry 2.0.0
>> >> and
>> >> > the
>> >> > > > >> "v1"
>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and the
>> "v2"
>> >> > > > >> binding).
>> >> > > > >>>>>
>> >> > > > >>>>> I have a simple table called "words" (word STRING, count
>> >> INT). I
>> >> > am
>> >> > > > >>>> making
>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM words
>> where
>> >> > > count
>> >> > > > ==
>> >> > > > >>>>> '100'".
>> >> > > > >>>>>
>> >> > > > >>>>> "bob" is in the "manager" group", which has the following
>> >> role:
>> >> > > > >>>>>
>> >> > > > >>>>> select_all_role =
>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>> e=words->Column=*->action=sele
>> >> ct
>> >> > > > >>>>>
>> >> > > > >>>>> Essentially, authorization is denied even though the
>> policy is
>> >> > > > correct.
>> >> > > > >>>> If
>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached privilege
>> is:
>> >> > > > >>>>>
>> >> > > > >>>>> server=server1->db=authz->table=words->column=*=[Server=
>> >> > > > >>>>> server1->Db=authz->Table=words->Column=*->action=select]
>> >> > > > >>>>>
>> >> > > > >>>>> However, when "listPrivileges" is called, the authorizable
>> >> > > hierarchy
>> >> > > > >>>> looks
>> >> > > > >>>>> like:
>> >> > > > >>>>>
>> >> > > > >>>>> Server [name=server1]
>> >> > > > >>>>> Database [name=authz]
>> >> > > > >>>>> Table [name=words]
>> >> > > > >>>>>
>> >> > > > >>>>> There is no "column" here, and a match is not made against
>> the
>> >> > > cached
>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing some
>> >> > > > configuration
>> >> > > > >>>>> switch?
>> >> > > > >>>>>
>> >> > > > >>>>> Colm.
>> >> > > > >>>>>
>> >> > > > >>>>>
>> >> > > > >>>>> --
>> >> > > > >>>>> Colm O hEigeartaigh
>> >> > > > >>>>>
>> >> > > > >>>>> Talend Community Coder
>> >> > > > >>>>> http://coders.talend.com
>> >> > > > >>>>>
>> >> > > > >>>>
>> >> > > > >>>
>> >> > > > >>>
>> >> > > > >>>
>> >> > > > >>> --
>> >> > > > >>> Colm O hEigeartaigh
>> >> > > > >>>
>> >> > > > >>> Talend Community Coder
>> >> > > > >>> http://coders.talend.com
>> >> > > > >>
>> >> > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > > --
>> >> > > > > Colm O hEigeartaigh
>> >> > > > >
>> >> > > > > Talend Community Coder
>> >> > > > > http://coders.talend.com
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Colm O hEigeartaigh
>> >>
>> >> Talend Community Coder
>> >> http://coders.talend.com
>> >>
>> >
>> >
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>