You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Sergio Pena <se...@cloudera.com> on 2017/09/29 19:59:32 UTC

Concerns about Hive authz2 support

Hi All,

We are running into some problems with the support of Hive Authz V2
especially related to the workaround that parses Hive command strings in
Sentry using regular expressions to get some info that Hive is not sending
through the authz2 api. Hive 2.0 made some changes on commands that caused
issues with Sentry. These are fixed but the concern of doing this SQL
parsing exists. We asked the Hive community to give us extra SQL
information, but we cannot implement them in Sentry until a Hive release is
done. There are some concerns about the quality of authz2 too, such as
create/drop table and functions calling Sentry twice for authorization and
the lack of testing being done on it.

The original idea for Sentry 2.0 (future release) was to drop authz1
support and use authz2 as default but the work is getting delayed until
Hive releases something. Now that we bumped the Hive version to 2.0, I was
wondering if we should continue with authz1 and keep authz2 as an
experimental support until Hive releases something we can consume to fix
our issues. Then we can deprecate authz1 in a future 2.x release and remove
it in a major version.

I was thinking if we remove any hive-authz2 profile and just add the
hive-authz2 classes to the current sentry-binding-hive module so that users
are allowed to switch either to v1 or v2 (for testing). Also for the tests,
find a way to run all sentry-tests-hive with v1 and v2 to validate the
quality of it.

What does the PMC community think? Is it a good or bad idea?

- Sergio

Re: Concerns about Hive authz2 support

Posted by Na Li <li...@cloudera.com>.
Sergio,

Regarding input, what I found is below

 context of input at  semantic hook postAnalyze() for command ("SELECT
COUNT(A) FROM VIEW_1").
inputs[0] db_1@view_1  accessedColumns size = 1 value is "a"
       inputs[1] db_1@tab_1 accessedColumns  size = 0

       input of checkPrivileges
       inputHObjs size = 1

 input of checkPrivileges()
       inputHObjs size = 1
       inputHobjs[0].dbname = "db_1"
       inputHobjs[0].objectName = "view_1"
       inputHobjs[0].colums size = 1
       inputHobjs[0].colums[0] = "a"

So the input at DefaultSentryValidator.checkPrivileges() is the most
desirable. Hive passes in the exact info to sentry for authorization. If
user has privilege for accessing the view, but does not have privilege for
accessing the table, the authorization will succeed.

In semantic hook, Hive passes to Sentry correct info for view, but it also
passes extra info for table, which is not used in the command. If user has
privilege for accessing the view, but does not have privilege for accessing
the table, the authorization will fail.

Thanks,

Lina

On Tue, Oct 10, 2017 at 6:44 PM, Sergio Pena <se...@cloudera.com>
wrote:

> I was reviewing the hive-authz2 support, and so far the concerns are only
> on the checkPrivileges() method that is called by Hive to request access to
> a Hive object. This is similar to what HiveAuthzBindingHook does on the
> hive-authz1 profile.
>
> So, I think I can do a mix of work. Use the authz2 work for grant/revoke
> tasks and filtering HMS metadata + continuing using the
> HiveAuthzBindingHook (or Semantic hooks) class to request access to an
> object. This way we can use what is already tested by Sentry for years.
> Also, these semantic hooks are not part of authz1 in Hive so we are sure
> they won't be deleted in the future.
>
> As an extra info on the difference in Hive about the semantic hooks and the
> authorization V2 is just the parameters that are passed. Hive calls both,
> Semantic hooks and doAuthorizationV2, in the same scope (Driver.compile),
> so there is no benefit for us to continue investing effort on fixing the
> checkPrivileges and Hive in order to make it work (at least no in the
> short-term).
>
> Here's the patch 1 of 2 I'm working on:
> https://issues.apache.org/jira/browse/SENTRY-1978
>
> A 2nd patch will include another part of the hive-auth2 for filtering HMS
> metadata.
> Let me know if you guys have a concern about it
>
> - Sergio
>
>
>
> On Tue, Oct 3, 2017 at 4:40 PM, Sergio Pena <se...@cloudera.com>
> wrote:
>
> > Aaah, I missed the $hive-v2.version property. I see we've been using Hive
> > 2.0 for authz2 for a while.
> >
> > The concern for authz2 is about parsing Hive SQL strings that could
> change
> > in future Hive versions, and when attempting to remove that, then other
> > issues (that Na Li explained) started happening. The current authz2 works
> > fine with the SQL parsing workaround that looks weird to do in Sentry.
> >
> > I thought a little more about the profiles, and it's good to idea keep
> > both profiles for a while to avoid more complications later. Still, we
> need
> > to really test this authz2 very well before switching to it as the
> default.
> >
> > Regarding Hive fixes, they're currently targeted to Hive 3.0. We can ask
> a
> > backport to Hive 2.x but I don't know when they're going to release a new
> > Hive 2.x version. This could delay our Sentry 2.0 + HA release if we want
> > to wait for those releases.
> >
> > What should we do? Should we wait for a Hive release and verify authz2
> > works correctly? or should we keep
> > authz1 as the default and release something soon?
> >
> > - Sergio
> >
> > On Tue, Oct 3, 2017 at 11:33 AM, Na Li <li...@cloudera.com> wrote:
> >
> >> Colm and Sergio,
> >>
> >> https://issues.apache.org/jira/browse/SENTRY-1957 tracks issues we
> found
> >> when integrating hive with sentry for auth-2.
> >>
> >> The issues we found for auth-2 when integrating with Hive are:
> >>
> >> 1. When dropping non-exist database, hive does not pass name of the
> >> non-exist database to DefaultSentryValidator.checkPrivileges(). It
> seems
> >> it is hard for hive to fix it.
> >> 2. When dropping non-exist table, hive does not pass name of the
> >> non-exist table to DefaultSentryValidator.checkPrivileges(). It seems
> it
> >> is hard for hive to fix it.
> >> 3. When creating or dropping function, hive does not pass class name of
> >> the function. HIVE-17544
> >> <https://issues.apache.org/jira/browse/HIVE-17544> is fixing this
> issue.
> >>
> >> Sentry should remove parsing hive command, and it requires hive update.
> >> We are still working on how to fix those issues.
> >>
> >> Thanks,
> >>
> >> Lina
> >>
> >> On Tue, Oct 3, 2017 at 5:34 AM, Colm O hEigeartaigh <
> coheigea@apache.org>
> >> wrote:
> >>
> >>> Thanks for that detailed answer, Sergio. Just to correct you on
> something
> >>> though:
> >>>
> >>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems
> it
> >>> was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
> >>>
> >>> The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
> >>> e.g.:
> >>>
> >>> https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a5
> >>> 3d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79
> >>>
> >>> I know it works because I tested it with Hive 2.0.0 (
> >>> http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)
> >>>
> >>> > Now that we bumped the Hive version to 2.0, I was wondering if we
> >>> should
> >>> continue with authz1 and keep authz2 as an
> >>> > experimental support until Hive releases something we can consume to
> >>> fix
> >>> > our issues. Then we can deprecate authz1 in a future 2.x release and
> >>> remove
> >>> > it in a major version.
> >>>
> >>> Yes I think this makes sense, given the concerns you have raised. Do
> you
> >>> have a timeline on when the Hive issues are likely to be fixed? Maybe
> it
> >>> could be done before Sentry 2.0.0 in which case we could drop authz1
> >>> anyway
> >>> for 2.0.0?
> >>>
> >>> > I was thinking if we remove any hive-authz2 profile and just add the
> >>> > hive-authz2 classes to the current sentry-binding-hive module so that
> >>> users
> >>> > are allowed to switch either to v1 or v2 (for testing). Also for the
> >>> tests,
> >>> > find a way to run all sentry-tests-hive with v1 and v2 to validate
> the
> >>> > quality of it.
> >>>
> >>> Removing the hive-authz2 profile though could make it more difficult to
> >>> remove the authz1 functionality in the future, as it will be less clear
> >>> where the demarcation is between the two. Definitely it makes sense to
> >>> run
> >>> the tests with both versions if possible.
> >>>
> >>> Colm.
> >>>
> >>> On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <sergio.pena@cloudera.com
> >
> >>> wrote:
> >>>
> >>> > Sure.
> >>> >
> >>> > First, here's what Hive Wiki says about authz1 limitations:
> >>> >
> >>> > The default authorization in Hive
> >>> > <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
> >>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
> >>> uthorization(LegacyMode)>
> >>> >  is *not designed* with the intent to protect against malicious users
> >>> > accessing data they should not be accessing. It only helps in
> >>> preventing
> >>> > users from accidentally doing operations they are not supposed to do.
> >>> It is
> >>> > also incomplete because it does not have authorization checks for
> many
> >>> > operations including the grant statement. The authorization checks
> >>> happen
> >>> > during Hive query compilation. But as the user is allowed to
> >>> > execute dfs commands, user-defined functions and shell commands, it
> is
> >>> > possible to bypass the client security checks.
> >>> >
> >>> > See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> >>> > ndard+Based+Hive+Authorization
> >>> >
> >>> > The above problem is the reason Hive introduced a new authorization
> API
> >>> > called authz2. However, I saw that some of those limitations are
> >>> handled by
> >>> > Sentry already, such as GRANT privilege checks (on the Sentry server
> >>> side).
> >>> > Also, Sentry provides the SentryGrantRevokeTask to handle the
> >>> GRANT/REVOKE
> >>> > execution instead of using the authz1 API that Hive provides.
> >>> >
> >>> > Authz1 uses the following configurations:
> >>> >
> >>> >    - *hive.security.authorization.ma
> >>> >    <http://hive.security.authorization.ma>nager*=(implementation of
> >>> >    HiveAuthorizerFactory)
> >>> >
> >>> >
> >>> >    - *hive.security.authorization.enabled*=true
> >>> >
> >>> >
> >>> > There is, though, a HiveAuthorizerFactory implementation on Sentry
> when
> >>> > bumping the version to Hive 0.13, but it does not provide the
> >>> Controller
> >>> > nor Validator classes to handle authorization for v1. These classes
> >>> were
> >>> > introduced in Sentry later to support authz2.
> >>> >
> >>> > Based on the above, I think that Sentry does not use authz1
> completely
> >>> as
> >>> > it provides the hooks necessary to the semantic analyzer and task
> >>> execution
> >>> > to provide that support (correct me if I'm wrong).
> >>> >
> >>> > Nevertheless, the authz2 provides other functionalities that it would
> >>> be
> >>> > good to support, such as DFS commands authorization and keep HMS
> client
> >>> > filtering, GRANT/REVOKE executions and privileges checks in one class
> >>> > (HiveAuthorizerFactory) instead of 3 that Sentry provides.
> >>> >
> >>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems
> it
> >>> was
> >>> > Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
> >>> >
> >>> > Another proposal is to keep authz1 as default for Sentry 2.0 like
> >>> Sentry
> >>> > 1.8 provides, and deprecate it later in the Sentry 2.x line once
> >>> authz2 is
> >>> > stable and we bump to newer versions of Hive 2 with fixes on this.
> >>> >
> >>> > The configuration difference is:
> >>> >
> >>> > *Sentry authz1:*
> >>> > HMS
> >>> >   MetastoreAuthzBinding for HMS server authorization.
> >>> >
> >>> > HS2
> >>> >   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
> >>> >   SentryMetaStoreFilterHook for hms client filtering.
> >>> >   SentryHiveAuthorizationTaskFactoryImpl that creates the
> >>> > SentryGrantRevokeTask.
> >>> >   HiveAuthzBindingHook that checks privileges during the semantic
> >>> analyzer.
> >>> >   SentryGrantRevokeTask that executes adds/removes privileges on the
> >>> > Sentry server.
> >>> >
> >>> > *Sentry authz2:*
> >>> > HMS
> >>> >   MetastoreAuthzBinding for HMS server authorization.
> >>> >
> >>> > HS2
> >>> >   HiveAuthzBindingSessionHookV2 for configuring semantic/filter
> hooks.
> >>> >   SentryHiveAuthorizer that calls a Controller or Validator depending
> >>> on
> >>> > the authorization request.
> >>> >   SentryHiveAccessController to grant/revoke commands.
> >>> >   SentryAuthorizationValidator for HMS client filtering and check
> >>> > privileges.
> >>> >
> >>> >
> >>> > - Sergio
> >>> >
> >>> > On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <
> sergio.pena@cloudera.com
> >>> >
> >>> > wrote:
> >>> >
> >>> >> Sure.
> >>> >>
> >>> >> First, here's what Hive Wiki says about authz1 limitations:
> >>> >>
> >>> >> The default authorization in Hive
> >>> >> <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
> >>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
> >>> uthorization(LegacyMode)>
> >>> >>  is *not designed* with the intent to protect against malicious
> users
> >>> >> accessing data they should not be accessing. It only helps in
> >>> preventing
> >>> >> users from accidentally doing operations they are not supposed to
> do.
> >>> It is
> >>> >> also incomplete because it does not have authorization checks for
> many
> >>> >> operations including the grant statement. The authorization checks
> >>> happen
> >>> >> during Hive query compilation. But as the user is allowed to
> >>> >> execute dfs commands, user-defined functions and shell commands, it
> is
> >>> >> possible to bypass the client security checks.
> >>> >>
> >>> >> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> >>> >> ndard+Based+Hive+Authorization
> >>> >>
> >>> >> The above problem is the reason Hive introduced a new authorization
> >>> API
> >>> >> called authz2. However, I saw that some of those limitations are
> >>> handled by
> >>> >> Sentry already, such as GRANT privilege checks (on the Sentry server
> >>> side).
> >>> >> Also, Sentry provides the SentryGrantRevokeTask to handle the
> >>> GRANT/REVOKE
> >>> >> execution instead of using the authz1 API that Hive provides.
> >>> >>
> >>> >> Authz1 uses the following configurations:
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <
> >>> coheigea@apache.org>
> >>> >> wrote:
> >>> >>
> >>> >>> Hi Sergio,
> >>> >>>
> >>> >>> Could you give some background as to what the differences are
> between
> >>> >>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
> >>> >>>
> >>> >>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2
> >>> with
> >>> >>> Hive 2, so I assumed the separate bindings were related to the Hive
> >>> >>> versions being supported. Obviously this is not the case if we are
> >>> still
> >>> >>> talking about supporting authz1 with Hive 2.0.
> >>> >>>
> >>> >>> Colm.
> >>> >>>
> >>> >>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <
> >>> sergio.pena@cloudera.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>> > Hi All,
> >>> >>> >
> >>> >>> > We are running into some problems with the support of Hive Authz
> V2
> >>> >>> > especially related to the workaround that parses Hive command
> >>> strings
> >>> >>> in
> >>> >>> > Sentry using regular expressions to get some info that Hive is
> not
> >>> >>> sending
> >>> >>> > through the authz2 api. Hive 2.0 made some changes on commands
> that
> >>> >>> caused
> >>> >>> > issues with Sentry. These are fixed but the concern of doing this
> >>> SQL
> >>> >>> > parsing exists. We asked the Hive community to give us extra SQL
> >>> >>> > information, but we cannot implement them in Sentry until a Hive
> >>> >>> release is
> >>> >>> > done. There are some concerns about the quality of authz2 too,
> >>> such as
> >>> >>> > create/drop table and functions calling Sentry twice for
> >>> authorization
> >>> >>> and
> >>> >>> > the lack of testing being done on it.
> >>> >>> >
> >>> >>> > The original idea for Sentry 2.0 (future release) was to drop
> >>> authz1
> >>> >>> > support and use authz2 as default but the work is getting delayed
> >>> until
> >>> >>> > Hive releases something. Now that we bumped the Hive version to
> >>> 2.0, I
> >>> >>> was
> >>> >>> > wondering if we should continue with authz1 and keep authz2 as an
> >>> >>> > experimental support until Hive releases something we can consume
> >>> to
> >>> >>> fix
> >>> >>> > our issues. Then we can deprecate authz1 in a future 2.x release
> >>> and
> >>> >>> remove
> >>> >>> > it in a major version.
> >>> >>> >
> >>> >>> > I was thinking if we remove any hive-authz2 profile and just add
> >>> the
> >>> >>> > hive-authz2 classes to the current sentry-binding-hive module so
> >>> that
> >>> >>> users
> >>> >>> > are allowed to switch either to v1 or v2 (for testing). Also for
> >>> the
> >>> >>> tests,
> >>> >>> > find a way to run all sentry-tests-hive with v1 and v2 to
> validate
> >>> the
> >>> >>> > quality of it.
> >>> >>> >
> >>> >>> > What does the PMC community think? Is it a good or bad idea?
> >>> >>> >
> >>> >>> > - Sergio
> >>> >>> >
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Colm O hEigeartaigh
> >>> >>>
> >>> >>> Talend Community Coder
> >>> >>> http://coders.talend.com
> >>> >>>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>>
> >>> --
> >>> Colm O hEigeartaigh
> >>>
> >>> Talend Community Coder
> >>> http://coders.talend.com
> >>>
> >>
> >>
> >
>

Re: Concerns about Hive authz2 support

Posted by Sergio Pena <se...@cloudera.com>.
I was reviewing the hive-authz2 support, and so far the concerns are only
on the checkPrivileges() method that is called by Hive to request access to
a Hive object. This is similar to what HiveAuthzBindingHook does on the
hive-authz1 profile.

So, I think I can do a mix of work. Use the authz2 work for grant/revoke
tasks and filtering HMS metadata + continuing using the
HiveAuthzBindingHook (or Semantic hooks) class to request access to an
object. This way we can use what is already tested by Sentry for years.
Also, these semantic hooks are not part of authz1 in Hive so we are sure
they won't be deleted in the future.

As an extra info on the difference in Hive about the semantic hooks and the
authorization V2 is just the parameters that are passed. Hive calls both,
Semantic hooks and doAuthorizationV2, in the same scope (Driver.compile),
so there is no benefit for us to continue investing effort on fixing the
checkPrivileges and Hive in order to make it work (at least no in the
short-term).

Here's the patch 1 of 2 I'm working on:
https://issues.apache.org/jira/browse/SENTRY-1978

A 2nd patch will include another part of the hive-auth2 for filtering HMS
metadata.
Let me know if you guys have a concern about it

- Sergio



On Tue, Oct 3, 2017 at 4:40 PM, Sergio Pena <se...@cloudera.com>
wrote:

> Aaah, I missed the $hive-v2.version property. I see we've been using Hive
> 2.0 for authz2 for a while.
>
> The concern for authz2 is about parsing Hive SQL strings that could change
> in future Hive versions, and when attempting to remove that, then other
> issues (that Na Li explained) started happening. The current authz2 works
> fine with the SQL parsing workaround that looks weird to do in Sentry.
>
> I thought a little more about the profiles, and it's good to idea keep
> both profiles for a while to avoid more complications later. Still, we need
> to really test this authz2 very well before switching to it as the default.
>
> Regarding Hive fixes, they're currently targeted to Hive 3.0. We can ask a
> backport to Hive 2.x but I don't know when they're going to release a new
> Hive 2.x version. This could delay our Sentry 2.0 + HA release if we want
> to wait for those releases.
>
> What should we do? Should we wait for a Hive release and verify authz2
> works correctly? or should we keep
> authz1 as the default and release something soon?
>
> - Sergio
>
> On Tue, Oct 3, 2017 at 11:33 AM, Na Li <li...@cloudera.com> wrote:
>
>> Colm and Sergio,
>>
>> https://issues.apache.org/jira/browse/SENTRY-1957 tracks issues we found
>> when integrating hive with sentry for auth-2.
>>
>> The issues we found for auth-2 when integrating with Hive are:
>>
>> 1. When dropping non-exist database, hive does not pass name of the
>> non-exist database to DefaultSentryValidator.checkPrivileges(). It seems
>> it is hard for hive to fix it.
>> 2. When dropping non-exist table, hive does not pass name of the
>> non-exist table to DefaultSentryValidator.checkPrivileges(). It seems it
>> is hard for hive to fix it.
>> 3. When creating or dropping function, hive does not pass class name of
>> the function. HIVE-17544
>> <https://issues.apache.org/jira/browse/HIVE-17544> is fixing this issue.
>>
>> Sentry should remove parsing hive command, and it requires hive update.
>> We are still working on how to fix those issues.
>>
>> Thanks,
>>
>> Lina
>>
>> On Tue, Oct 3, 2017 at 5:34 AM, Colm O hEigeartaigh <co...@apache.org>
>> wrote:
>>
>>> Thanks for that detailed answer, Sergio. Just to correct you on something
>>> though:
>>>
>>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
>>> was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>>>
>>> The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
>>> e.g.:
>>>
>>> https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a5
>>> 3d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79
>>>
>>> I know it works because I tested it with Hive 2.0.0 (
>>> http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)
>>>
>>> > Now that we bumped the Hive version to 2.0, I was wondering if we
>>> should
>>> continue with authz1 and keep authz2 as an
>>> > experimental support until Hive releases something we can consume to
>>> fix
>>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>>> remove
>>> > it in a major version.
>>>
>>> Yes I think this makes sense, given the concerns you have raised. Do you
>>> have a timeline on when the Hive issues are likely to be fixed? Maybe it
>>> could be done before Sentry 2.0.0 in which case we could drop authz1
>>> anyway
>>> for 2.0.0?
>>>
>>> > I was thinking if we remove any hive-authz2 profile and just add the
>>> > hive-authz2 classes to the current sentry-binding-hive module so that
>>> users
>>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>>> tests,
>>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>>> > quality of it.
>>>
>>> Removing the hive-authz2 profile though could make it more difficult to
>>> remove the authz1 functionality in the future, as it will be less clear
>>> where the demarcation is between the two. Definitely it makes sense to
>>> run
>>> the tests with both versions if possible.
>>>
>>> Colm.
>>>
>>> On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <se...@cloudera.com>
>>> wrote:
>>>
>>> > Sure.
>>> >
>>> > First, here's what Hive Wiki says about authz1 limitations:
>>> >
>>> > The default authorization in Hive
>>> > <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
>>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
>>> uthorization(LegacyMode)>
>>> >  is *not designed* with the intent to protect against malicious users
>>> > accessing data they should not be accessing. It only helps in
>>> preventing
>>> > users from accidentally doing operations they are not supposed to do.
>>> It is
>>> > also incomplete because it does not have authorization checks for many
>>> > operations including the grant statement. The authorization checks
>>> happen
>>> > during Hive query compilation. But as the user is allowed to
>>> > execute dfs commands, user-defined functions and shell commands, it is
>>> > possible to bypass the client security checks.
>>> >
>>> > See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>>> > ndard+Based+Hive+Authorization
>>> >
>>> > The above problem is the reason Hive introduced a new authorization API
>>> > called authz2. However, I saw that some of those limitations are
>>> handled by
>>> > Sentry already, such as GRANT privilege checks (on the Sentry server
>>> side).
>>> > Also, Sentry provides the SentryGrantRevokeTask to handle the
>>> GRANT/REVOKE
>>> > execution instead of using the authz1 API that Hive provides.
>>> >
>>> > Authz1 uses the following configurations:
>>> >
>>> >    - *hive.security.authorization.ma
>>> >    <http://hive.security.authorization.ma>nager*=(implementation of
>>> >    HiveAuthorizerFactory)
>>> >
>>> >
>>> >    - *hive.security.authorization.enabled*=true
>>> >
>>> >
>>> > There is, though, a HiveAuthorizerFactory implementation on Sentry when
>>> > bumping the version to Hive 0.13, but it does not provide the
>>> Controller
>>> > nor Validator classes to handle authorization for v1. These classes
>>> were
>>> > introduced in Sentry later to support authz2.
>>> >
>>> > Based on the above, I think that Sentry does not use authz1 completely
>>> as
>>> > it provides the hooks necessary to the semantic analyzer and task
>>> execution
>>> > to provide that support (correct me if I'm wrong).
>>> >
>>> > Nevertheless, the authz2 provides other functionalities that it would
>>> be
>>> > good to support, such as DFS commands authorization and keep HMS client
>>> > filtering, GRANT/REVOKE executions and privileges checks in one class
>>> > (HiveAuthorizerFactory) instead of 3 that Sentry provides.
>>> >
>>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
>>> was
>>> > Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>>> >
>>> > Another proposal is to keep authz1 as default for Sentry 2.0 like
>>> Sentry
>>> > 1.8 provides, and deprecate it later in the Sentry 2.x line once
>>> authz2 is
>>> > stable and we bump to newer versions of Hive 2 with fixes on this.
>>> >
>>> > The configuration difference is:
>>> >
>>> > *Sentry authz1:*
>>> > HMS
>>> >   MetastoreAuthzBinding for HMS server authorization.
>>> >
>>> > HS2
>>> >   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
>>> >   SentryMetaStoreFilterHook for hms client filtering.
>>> >   SentryHiveAuthorizationTaskFactoryImpl that creates the
>>> > SentryGrantRevokeTask.
>>> >   HiveAuthzBindingHook that checks privileges during the semantic
>>> analyzer.
>>> >   SentryGrantRevokeTask that executes adds/removes privileges on the
>>> > Sentry server.
>>> >
>>> > *Sentry authz2:*
>>> > HMS
>>> >   MetastoreAuthzBinding for HMS server authorization.
>>> >
>>> > HS2
>>> >   HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
>>> >   SentryHiveAuthorizer that calls a Controller or Validator depending
>>> on
>>> > the authorization request.
>>> >   SentryHiveAccessController to grant/revoke commands.
>>> >   SentryAuthorizationValidator for HMS client filtering and check
>>> > privileges.
>>> >
>>> >
>>> > - Sergio
>>> >
>>> > On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <sergio.pena@cloudera.com
>>> >
>>> > wrote:
>>> >
>>> >> Sure.
>>> >>
>>> >> First, here's what Hive Wiki says about authz1 limitations:
>>> >>
>>> >> The default authorization in Hive
>>> >> <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
>>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
>>> uthorization(LegacyMode)>
>>> >>  is *not designed* with the intent to protect against malicious users
>>> >> accessing data they should not be accessing. It only helps in
>>> preventing
>>> >> users from accidentally doing operations they are not supposed to do.
>>> It is
>>> >> also incomplete because it does not have authorization checks for many
>>> >> operations including the grant statement. The authorization checks
>>> happen
>>> >> during Hive query compilation. But as the user is allowed to
>>> >> execute dfs commands, user-defined functions and shell commands, it is
>>> >> possible to bypass the client security checks.
>>> >>
>>> >> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>>> >> ndard+Based+Hive+Authorization
>>> >>
>>> >> The above problem is the reason Hive introduced a new authorization
>>> API
>>> >> called authz2. However, I saw that some of those limitations are
>>> handled by
>>> >> Sentry already, such as GRANT privilege checks (on the Sentry server
>>> side).
>>> >> Also, Sentry provides the SentryGrantRevokeTask to handle the
>>> GRANT/REVOKE
>>> >> execution instead of using the authz1 API that Hive provides.
>>> >>
>>> >> Authz1 uses the following configurations:
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <
>>> coheigea@apache.org>
>>> >> wrote:
>>> >>
>>> >>> Hi Sergio,
>>> >>>
>>> >>> Could you give some background as to what the differences are between
>>> >>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>> >>>
>>> >>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2
>>> with
>>> >>> Hive 2, so I assumed the separate bindings were related to the Hive
>>> >>> versions being supported. Obviously this is not the case if we are
>>> still
>>> >>> talking about supporting authz1 with Hive 2.0.
>>> >>>
>>> >>> Colm.
>>> >>>
>>> >>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <
>>> sergio.pena@cloudera.com>
>>> >>> wrote:
>>> >>>
>>> >>> > Hi All,
>>> >>> >
>>> >>> > We are running into some problems with the support of Hive Authz V2
>>> >>> > especially related to the workaround that parses Hive command
>>> strings
>>> >>> in
>>> >>> > Sentry using regular expressions to get some info that Hive is not
>>> >>> sending
>>> >>> > through the authz2 api. Hive 2.0 made some changes on commands that
>>> >>> caused
>>> >>> > issues with Sentry. These are fixed but the concern of doing this
>>> SQL
>>> >>> > parsing exists. We asked the Hive community to give us extra SQL
>>> >>> > information, but we cannot implement them in Sentry until a Hive
>>> >>> release is
>>> >>> > done. There are some concerns about the quality of authz2 too,
>>> such as
>>> >>> > create/drop table and functions calling Sentry twice for
>>> authorization
>>> >>> and
>>> >>> > the lack of testing being done on it.
>>> >>> >
>>> >>> > The original idea for Sentry 2.0 (future release) was to drop
>>> authz1
>>> >>> > support and use authz2 as default but the work is getting delayed
>>> until
>>> >>> > Hive releases something. Now that we bumped the Hive version to
>>> 2.0, I
>>> >>> was
>>> >>> > wondering if we should continue with authz1 and keep authz2 as an
>>> >>> > experimental support until Hive releases something we can consume
>>> to
>>> >>> fix
>>> >>> > our issues. Then we can deprecate authz1 in a future 2.x release
>>> and
>>> >>> remove
>>> >>> > it in a major version.
>>> >>> >
>>> >>> > I was thinking if we remove any hive-authz2 profile and just add
>>> the
>>> >>> > hive-authz2 classes to the current sentry-binding-hive module so
>>> that
>>> >>> users
>>> >>> > are allowed to switch either to v1 or v2 (for testing). Also for
>>> the
>>> >>> tests,
>>> >>> > find a way to run all sentry-tests-hive with v1 and v2 to validate
>>> the
>>> >>> > quality of it.
>>> >>> >
>>> >>> > What does the PMC community think? Is it a good or bad idea?
>>> >>> >
>>> >>> > - Sergio
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Colm O hEigeartaigh
>>> >>>
>>> >>> Talend Community Coder
>>> >>> http://coders.talend.com
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>

Re: Concerns about Hive authz2 support

Posted by Sergio Pena <se...@cloudera.com>.
Aaah, I missed the $hive-v2.version property. I see we've been using Hive
2.0 for authz2 for a while.

The concern for authz2 is about parsing Hive SQL strings that could change
in future Hive versions, and when attempting to remove that, then other
issues (that Na Li explained) started happening. The current authz2 works
fine with the SQL parsing workaround that looks weird to do in Sentry.

I thought a little more about the profiles, and it's good to idea keep both
profiles for a while to avoid more complications later. Still, we need to
really test this authz2 very well before switching to it as the default.

Regarding Hive fixes, they're currently targeted to Hive 3.0. We can ask a
backport to Hive 2.x but I don't know when they're going to release a new
Hive 2.x version. This could delay our Sentry 2.0 + HA release if we want
to wait for those releases.

What should we do? Should we wait for a Hive release and verify authz2
works correctly? or should we keep
authz1 as the default and release something soon?

- Sergio

On Tue, Oct 3, 2017 at 11:33 AM, Na Li <li...@cloudera.com> wrote:

> Colm and Sergio,
>
> https://issues.apache.org/jira/browse/SENTRY-1957 tracks issues we found
> when integrating hive with sentry for auth-2.
>
> The issues we found for auth-2 when integrating with Hive are:
>
> 1. When dropping non-exist database, hive does not pass name of the
> non-exist database to DefaultSentryValidator.checkPrivileges(). It seems
> it is hard for hive to fix it.
> 2. When dropping non-exist table, hive does not pass name of the non-exist
> table to DefaultSentryValidator.checkPrivileges(). It seems it is hard
> for hive to fix it.
> 3. When creating or dropping function, hive does not pass class name of
> the function. HIVE-17544
> <https://issues.apache.org/jira/browse/HIVE-17544> is fixing this issue.
>
> Sentry should remove parsing hive command, and it requires hive update. We
> are still working on how to fix those issues.
>
> Thanks,
>
> Lina
>
> On Tue, Oct 3, 2017 at 5:34 AM, Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
>> Thanks for that detailed answer, Sergio. Just to correct you on something
>> though:
>>
>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
>> was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>>
>> The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
>> e.g.:
>>
>> https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a5
>> 3d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79
>>
>> I know it works because I tested it with Hive 2.0.0 (
>> http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)
>>
>> > Now that we bumped the Hive version to 2.0, I was wondering if we should
>> continue with authz1 and keep authz2 as an
>> > experimental support until Hive releases something we can consume to fix
>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>> remove
>> > it in a major version.
>>
>> Yes I think this makes sense, given the concerns you have raised. Do you
>> have a timeline on when the Hive issues are likely to be fixed? Maybe it
>> could be done before Sentry 2.0.0 in which case we could drop authz1
>> anyway
>> for 2.0.0?
>>
>> > I was thinking if we remove any hive-authz2 profile and just add the
>> > hive-authz2 classes to the current sentry-binding-hive module so that
>> users
>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>> tests,
>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>> > quality of it.
>>
>> Removing the hive-authz2 profile though could make it more difficult to
>> remove the authz1 functionality in the future, as it will be less clear
>> where the demarcation is between the two. Definitely it makes sense to run
>> the tests with both versions if possible.
>>
>> Colm.
>>
>> On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <se...@cloudera.com>
>> wrote:
>>
>> > Sure.
>> >
>> > First, here's what Hive Wiki says about authz1 limitations:
>> >
>> > The default authorization in Hive
>> > <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
>> uthorization(LegacyMode)>
>> >  is *not designed* with the intent to protect against malicious users
>> > accessing data they should not be accessing. It only helps in preventing
>> > users from accidentally doing operations they are not supposed to do.
>> It is
>> > also incomplete because it does not have authorization checks for many
>> > operations including the grant statement. The authorization checks
>> happen
>> > during Hive query compilation. But as the user is allowed to
>> > execute dfs commands, user-defined functions and shell commands, it is
>> > possible to bypass the client security checks.
>> >
>> > See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>> > ndard+Based+Hive+Authorization
>> >
>> > The above problem is the reason Hive introduced a new authorization API
>> > called authz2. However, I saw that some of those limitations are
>> handled by
>> > Sentry already, such as GRANT privilege checks (on the Sentry server
>> side).
>> > Also, Sentry provides the SentryGrantRevokeTask to handle the
>> GRANT/REVOKE
>> > execution instead of using the authz1 API that Hive provides.
>> >
>> > Authz1 uses the following configurations:
>> >
>> >    - *hive.security.authorization.ma
>> >    <http://hive.security.authorization.ma>nager*=(implementation of
>> >    HiveAuthorizerFactory)
>> >
>> >
>> >    - *hive.security.authorization.enabled*=true
>> >
>> >
>> > There is, though, a HiveAuthorizerFactory implementation on Sentry when
>> > bumping the version to Hive 0.13, but it does not provide the Controller
>> > nor Validator classes to handle authorization for v1. These classes were
>> > introduced in Sentry later to support authz2.
>> >
>> > Based on the above, I think that Sentry does not use authz1 completely
>> as
>> > it provides the hooks necessary to the semantic analyzer and task
>> execution
>> > to provide that support (correct me if I'm wrong).
>> >
>> > Nevertheless, the authz2 provides other functionalities that it would be
>> > good to support, such as DFS commands authorization and keep HMS client
>> > filtering, GRANT/REVOKE executions and privileges checks in one class
>> > (HiveAuthorizerFactory) instead of 3 that Sentry provides.
>> >
>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
>> was
>> > Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>> >
>> > Another proposal is to keep authz1 as default for Sentry 2.0 like Sentry
>> > 1.8 provides, and deprecate it later in the Sentry 2.x line once authz2
>> is
>> > stable and we bump to newer versions of Hive 2 with fixes on this.
>> >
>> > The configuration difference is:
>> >
>> > *Sentry authz1:*
>> > HMS
>> >   MetastoreAuthzBinding for HMS server authorization.
>> >
>> > HS2
>> >   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
>> >   SentryMetaStoreFilterHook for hms client filtering.
>> >   SentryHiveAuthorizationTaskFactoryImpl that creates the
>> > SentryGrantRevokeTask.
>> >   HiveAuthzBindingHook that checks privileges during the semantic
>> analyzer.
>> >   SentryGrantRevokeTask that executes adds/removes privileges on the
>> > Sentry server.
>> >
>> > *Sentry authz2:*
>> > HMS
>> >   MetastoreAuthzBinding for HMS server authorization.
>> >
>> > HS2
>> >   HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
>> >   SentryHiveAuthorizer that calls a Controller or Validator depending on
>> > the authorization request.
>> >   SentryHiveAccessController to grant/revoke commands.
>> >   SentryAuthorizationValidator for HMS client filtering and check
>> > privileges.
>> >
>> >
>> > - Sergio
>> >
>> > On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <se...@cloudera.com>
>> > wrote:
>> >
>> >> Sure.
>> >>
>> >> First, here's what Hive Wiki says about authz1 limitations:
>> >>
>> >> The default authorization in Hive
>> >> <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
>> uthorization(LegacyMode)>
>> >>  is *not designed* with the intent to protect against malicious users
>> >> accessing data they should not be accessing. It only helps in
>> preventing
>> >> users from accidentally doing operations they are not supposed to do.
>> It is
>> >> also incomplete because it does not have authorization checks for many
>> >> operations including the grant statement. The authorization checks
>> happen
>> >> during Hive query compilation. But as the user is allowed to
>> >> execute dfs commands, user-defined functions and shell commands, it is
>> >> possible to bypass the client security checks.
>> >>
>> >> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>> >> ndard+Based+Hive+Authorization
>> >>
>> >> The above problem is the reason Hive introduced a new authorization API
>> >> called authz2. However, I saw that some of those limitations are
>> handled by
>> >> Sentry already, such as GRANT privilege checks (on the Sentry server
>> side).
>> >> Also, Sentry provides the SentryGrantRevokeTask to handle the
>> GRANT/REVOKE
>> >> execution instead of using the authz1 API that Hive provides.
>> >>
>> >> Authz1 uses the following configurations:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <
>> coheigea@apache.org>
>> >> wrote:
>> >>
>> >>> Hi Sergio,
>> >>>
>> >>> Could you give some background as to what the differences are between
>> >>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>> >>>
>> >>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2
>> with
>> >>> Hive 2, so I assumed the separate bindings were related to the Hive
>> >>> versions being supported. Obviously this is not the case if we are
>> still
>> >>> talking about supporting authz1 with Hive 2.0.
>> >>>
>> >>> Colm.
>> >>>
>> >>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <
>> sergio.pena@cloudera.com>
>> >>> wrote:
>> >>>
>> >>> > Hi All,
>> >>> >
>> >>> > We are running into some problems with the support of Hive Authz V2
>> >>> > especially related to the workaround that parses Hive command
>> strings
>> >>> in
>> >>> > Sentry using regular expressions to get some info that Hive is not
>> >>> sending
>> >>> > through the authz2 api. Hive 2.0 made some changes on commands that
>> >>> caused
>> >>> > issues with Sentry. These are fixed but the concern of doing this
>> SQL
>> >>> > parsing exists. We asked the Hive community to give us extra SQL
>> >>> > information, but we cannot implement them in Sentry until a Hive
>> >>> release is
>> >>> > done. There are some concerns about the quality of authz2 too, such
>> as
>> >>> > create/drop table and functions calling Sentry twice for
>> authorization
>> >>> and
>> >>> > the lack of testing being done on it.
>> >>> >
>> >>> > The original idea for Sentry 2.0 (future release) was to drop authz1
>> >>> > support and use authz2 as default but the work is getting delayed
>> until
>> >>> > Hive releases something. Now that we bumped the Hive version to
>> 2.0, I
>> >>> was
>> >>> > wondering if we should continue with authz1 and keep authz2 as an
>> >>> > experimental support until Hive releases something we can consume to
>> >>> fix
>> >>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>> >>> remove
>> >>> > it in a major version.
>> >>> >
>> >>> > I was thinking if we remove any hive-authz2 profile and just add the
>> >>> > hive-authz2 classes to the current sentry-binding-hive module so
>> that
>> >>> users
>> >>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>> >>> tests,
>> >>> > find a way to run all sentry-tests-hive with v1 and v2 to validate
>> the
>> >>> > quality of it.
>> >>> >
>> >>> > What does the PMC community think? Is it a good or bad idea?
>> >>> >
>> >>> > - Sergio
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Colm O hEigeartaigh
>> >>>
>> >>> Talend Community Coder
>> >>> http://coders.talend.com
>> >>>
>> >>
>> >>
>> >
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>

Re: Concerns about Hive authz2 support

Posted by Na Li <li...@cloudera.com>.
Colm and Sergio,

https://issues.apache.org/jira/browse/SENTRY-1957 tracks issues we found
when integrating hive with sentry for auth-2.

The issues we found for auth-2 when integrating with Hive are:

1. When dropping non-exist database, hive does not pass name of the
non-exist database to DefaultSentryValidator.checkPrivileges(). It seems it
is hard for hive to fix it.
2. When dropping non-exist table, hive does not pass name of the non-exist
table to DefaultSentryValidator.checkPrivileges(). It seems it is hard for
hive to fix it.
3. When creating or dropping function, hive does not pass class name of the
function. HIVE-17544 <https://issues.apache.org/jira/browse/HIVE-17544> is
fixing this issue.

Sentry should remove parsing hive command, and it requires hive update. We
are still working on how to fix those issues.

Thanks,

Lina

On Tue, Oct 3, 2017 at 5:34 AM, Colm O hEigeartaigh <co...@apache.org>
wrote:

> Thanks for that detailed answer, Sergio. Just to correct you on something
> though:
>
> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
> was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>
> The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
> e.g.:
>
> https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a53d49b558
> bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79
>
> I know it works because I tested it with Hive 2.0.0 (
> http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)
>
> > Now that we bumped the Hive version to 2.0, I was wondering if we should
> continue with authz1 and keep authz2 as an
> > experimental support until Hive releases something we can consume to fix
> > our issues. Then we can deprecate authz1 in a future 2.x release and
> remove
> > it in a major version.
>
> Yes I think this makes sense, given the concerns you have raised. Do you
> have a timeline on when the Hive issues are likely to be fixed? Maybe it
> could be done before Sentry 2.0.0 in which case we could drop authz1 anyway
> for 2.0.0?
>
> > I was thinking if we remove any hive-authz2 profile and just add the
> > hive-authz2 classes to the current sentry-binding-hive module so that
> users
> > are allowed to switch either to v1 or v2 (for testing). Also for the
> tests,
> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
> > quality of it.
>
> Removing the hive-authz2 profile though could make it more difficult to
> remove the authz1 functionality in the future, as it will be less clear
> where the demarcation is between the two. Definitely it makes sense to run
> the tests with both versions if possible.
>
> Colm.
>
> On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <se...@cloudera.com>
> wrote:
>
> > Sure.
> >
> > First, here's what Hive Wiki says about authz1 limitations:
> >
> > The default authorization in Hive
> > <https://cwiki.apache.org/confluence/display/Hive/
> LanguageManual+Authorization#LanguageManualAuthorization-
> 3DefaultHiveAuthorization(LegacyMode)>
> >  is *not designed* with the intent to protect against malicious users
> > accessing data they should not be accessing. It only helps in preventing
> > users from accidentally doing operations they are not supposed to do. It
> is
> > also incomplete because it does not have authorization checks for many
> > operations including the grant statement. The authorization checks happen
> > during Hive query compilation. But as the user is allowed to
> > execute dfs commands, user-defined functions and shell commands, it is
> > possible to bypass the client security checks.
> >
> > See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> > ndard+Based+Hive+Authorization
> >
> > The above problem is the reason Hive introduced a new authorization API
> > called authz2. However, I saw that some of those limitations are handled
> by
> > Sentry already, such as GRANT privilege checks (on the Sentry server
> side).
> > Also, Sentry provides the SentryGrantRevokeTask to handle the
> GRANT/REVOKE
> > execution instead of using the authz1 API that Hive provides.
> >
> > Authz1 uses the following configurations:
> >
> >    - *hive.security.authorization.ma
> >    <http://hive.security.authorization.ma>nager*=(implementation of
> >    HiveAuthorizerFactory)
> >
> >
> >    - *hive.security.authorization.enabled*=true
> >
> >
> > There is, though, a HiveAuthorizerFactory implementation on Sentry when
> > bumping the version to Hive 0.13, but it does not provide the Controller
> > nor Validator classes to handle authorization for v1. These classes were
> > introduced in Sentry later to support authz2.
> >
> > Based on the above, I think that Sentry does not use authz1 completely as
> > it provides the hooks necessary to the semantic analyzer and task
> execution
> > to provide that support (correct me if I'm wrong).
> >
> > Nevertheless, the authz2 provides other functionalities that it would be
> > good to support, such as DFS commands authorization and keep HMS client
> > filtering, GRANT/REVOKE executions and privileges checks in one class
> > (HiveAuthorizerFactory) instead of 3 that Sentry provides.
> >
> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
> was
> > Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
> >
> > Another proposal is to keep authz1 as default for Sentry 2.0 like Sentry
> > 1.8 provides, and deprecate it later in the Sentry 2.x line once authz2
> is
> > stable and we bump to newer versions of Hive 2 with fixes on this.
> >
> > The configuration difference is:
> >
> > *Sentry authz1:*
> > HMS
> >   MetastoreAuthzBinding for HMS server authorization.
> >
> > HS2
> >   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
> >   SentryMetaStoreFilterHook for hms client filtering.
> >   SentryHiveAuthorizationTaskFactoryImpl that creates the
> > SentryGrantRevokeTask.
> >   HiveAuthzBindingHook that checks privileges during the semantic
> analyzer.
> >   SentryGrantRevokeTask that executes adds/removes privileges on the
> > Sentry server.
> >
> > *Sentry authz2:*
> > HMS
> >   MetastoreAuthzBinding for HMS server authorization.
> >
> > HS2
> >   HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
> >   SentryHiveAuthorizer that calls a Controller or Validator depending on
> > the authorization request.
> >   SentryHiveAccessController to grant/revoke commands.
> >   SentryAuthorizationValidator for HMS client filtering and check
> > privileges.
> >
> >
> > - Sergio
> >
> > On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <se...@cloudera.com>
> > wrote:
> >
> >> Sure.
> >>
> >> First, here's what Hive Wiki says about authz1 limitations:
> >>
> >> The default authorization in Hive
> >> <https://cwiki.apache.org/confluence/display/Hive/
> LanguageManual+Authorization#LanguageManualAuthorization-
> 3DefaultHiveAuthorization(LegacyMode)>
> >>  is *not designed* with the intent to protect against malicious users
> >> accessing data they should not be accessing. It only helps in preventing
> >> users from accidentally doing operations they are not supposed to do.
> It is
> >> also incomplete because it does not have authorization checks for many
> >> operations including the grant statement. The authorization checks
> happen
> >> during Hive query compilation. But as the user is allowed to
> >> execute dfs commands, user-defined functions and shell commands, it is
> >> possible to bypass the client security checks.
> >>
> >> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> >> ndard+Based+Hive+Authorization
> >>
> >> The above problem is the reason Hive introduced a new authorization API
> >> called authz2. However, I saw that some of those limitations are
> handled by
> >> Sentry already, such as GRANT privilege checks (on the Sentry server
> side).
> >> Also, Sentry provides the SentryGrantRevokeTask to handle the
> GRANT/REVOKE
> >> execution instead of using the authz1 API that Hive provides.
> >>
> >> Authz1 uses the following configurations:
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <
> coheigea@apache.org>
> >> wrote:
> >>
> >>> Hi Sergio,
> >>>
> >>> Could you give some background as to what the differences are between
> >>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
> >>>
> >>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
> >>> Hive 2, so I assumed the separate bindings were related to the Hive
> >>> versions being supported. Obviously this is not the case if we are
> still
> >>> talking about supporting authz1 with Hive 2.0.
> >>>
> >>> Colm.
> >>>
> >>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <sergio.pena@cloudera.com
> >
> >>> wrote:
> >>>
> >>> > Hi All,
> >>> >
> >>> > We are running into some problems with the support of Hive Authz V2
> >>> > especially related to the workaround that parses Hive command strings
> >>> in
> >>> > Sentry using regular expressions to get some info that Hive is not
> >>> sending
> >>> > through the authz2 api. Hive 2.0 made some changes on commands that
> >>> caused
> >>> > issues with Sentry. These are fixed but the concern of doing this SQL
> >>> > parsing exists. We asked the Hive community to give us extra SQL
> >>> > information, but we cannot implement them in Sentry until a Hive
> >>> release is
> >>> > done. There are some concerns about the quality of authz2 too, such
> as
> >>> > create/drop table and functions calling Sentry twice for
> authorization
> >>> and
> >>> > the lack of testing being done on it.
> >>> >
> >>> > The original idea for Sentry 2.0 (future release) was to drop authz1
> >>> > support and use authz2 as default but the work is getting delayed
> until
> >>> > Hive releases something. Now that we bumped the Hive version to 2.0,
> I
> >>> was
> >>> > wondering if we should continue with authz1 and keep authz2 as an
> >>> > experimental support until Hive releases something we can consume to
> >>> fix
> >>> > our issues. Then we can deprecate authz1 in a future 2.x release and
> >>> remove
> >>> > it in a major version.
> >>> >
> >>> > I was thinking if we remove any hive-authz2 profile and just add the
> >>> > hive-authz2 classes to the current sentry-binding-hive module so that
> >>> users
> >>> > are allowed to switch either to v1 or v2 (for testing). Also for the
> >>> tests,
> >>> > find a way to run all sentry-tests-hive with v1 and v2 to validate
> the
> >>> > quality of it.
> >>> >
> >>> > What does the PMC community think? Is it a good or bad idea?
> >>> >
> >>> > - Sergio
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Colm O hEigeartaigh
> >>>
> >>> Talend Community Coder
> >>> http://coders.talend.com
> >>>
> >>
> >>
> >
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>

Re: Concerns about Hive authz2 support

Posted by Colm O hEigeartaigh <co...@apache.org>.
Thanks for that detailed answer, Sergio. Just to correct you on something
though:

> Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).

The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
e.g.:

https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a53d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79

I know it works because I tested it with Hive 2.0.0 (
http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)

> Now that we bumped the Hive version to 2.0, I was wondering if we should
continue with authz1 and keep authz2 as an
> experimental support until Hive releases something we can consume to fix
> our issues. Then we can deprecate authz1 in a future 2.x release and
remove
> it in a major version.

Yes I think this makes sense, given the concerns you have raised. Do you
have a timeline on when the Hive issues are likely to be fixed? Maybe it
could be done before Sentry 2.0.0 in which case we could drop authz1 anyway
for 2.0.0?

> I was thinking if we remove any hive-authz2 profile and just add the
> hive-authz2 classes to the current sentry-binding-hive module so that
users
> are allowed to switch either to v1 or v2 (for testing). Also for the
tests,
> find a way to run all sentry-tests-hive with v1 and v2 to validate the
> quality of it.

Removing the hive-authz2 profile though could make it more difficult to
remove the authz1 functionality in the future, as it will be less clear
where the demarcation is between the two. Definitely it makes sense to run
the tests with both versions if possible.

Colm.

On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <se...@cloudera.com>
wrote:

> Sure.
>
> First, here's what Hive Wiki says about authz1 limitations:
>
> The default authorization in Hive
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>  is *not designed* with the intent to protect against malicious users
> accessing data they should not be accessing. It only helps in preventing
> users from accidentally doing operations they are not supposed to do. It is
> also incomplete because it does not have authorization checks for many
> operations including the grant statement. The authorization checks happen
> during Hive query compilation. But as the user is allowed to
> execute dfs commands, user-defined functions and shell commands, it is
> possible to bypass the client security checks.
>
> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> ndard+Based+Hive+Authorization
>
> The above problem is the reason Hive introduced a new authorization API
> called authz2. However, I saw that some of those limitations are handled by
> Sentry already, such as GRANT privilege checks (on the Sentry server side).
> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
> execution instead of using the authz1 API that Hive provides.
>
> Authz1 uses the following configurations:
>
>    - *hive.security.authorization.ma
>    <http://hive.security.authorization.ma>nager*=(implementation of
>    HiveAuthorizerFactory)
>
>
>    - *hive.security.authorization.enabled*=true
>
>
> There is, though, a HiveAuthorizerFactory implementation on Sentry when
> bumping the version to Hive 0.13, but it does not provide the Controller
> nor Validator classes to handle authorization for v1. These classes were
> introduced in Sentry later to support authz2.
>
> Based on the above, I think that Sentry does not use authz1 completely as
> it provides the hooks necessary to the semantic analyzer and task execution
> to provide that support (correct me if I'm wrong).
>
> Nevertheless, the authz2 provides other functionalities that it would be
> good to support, such as DFS commands authorization and keep HMS client
> filtering, GRANT/REVOKE executions and privileges checks in one class
> (HiveAuthorizerFactory) instead of 3 that Sentry provides.
>
> Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it was
> Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>
> Another proposal is to keep authz1 as default for Sentry 2.0 like Sentry
> 1.8 provides, and deprecate it later in the Sentry 2.x line once authz2 is
> stable and we bump to newer versions of Hive 2 with fixes on this.
>
> The configuration difference is:
>
> *Sentry authz1:*
> HMS
>   MetastoreAuthzBinding for HMS server authorization.
>
> HS2
>   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
>   SentryMetaStoreFilterHook for hms client filtering.
>   SentryHiveAuthorizationTaskFactoryImpl that creates the
> SentryGrantRevokeTask.
>   HiveAuthzBindingHook that checks privileges during the semantic analyzer.
>   SentryGrantRevokeTask that executes adds/removes privileges on the
> Sentry server.
>
> *Sentry authz2:*
> HMS
>   MetastoreAuthzBinding for HMS server authorization.
>
> HS2
>   HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
>   SentryHiveAuthorizer that calls a Controller or Validator depending on
> the authorization request.
>   SentryHiveAccessController to grant/revoke commands.
>   SentryAuthorizationValidator for HMS client filtering and check
> privileges.
>
>
> - Sergio
>
> On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <se...@cloudera.com>
> wrote:
>
>> Sure.
>>
>> First, here's what Hive Wiki says about authz1 limitations:
>>
>> The default authorization in Hive
>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>>  is *not designed* with the intent to protect against malicious users
>> accessing data they should not be accessing. It only helps in preventing
>> users from accidentally doing operations they are not supposed to do. It is
>> also incomplete because it does not have authorization checks for many
>> operations including the grant statement. The authorization checks happen
>> during Hive query compilation. But as the user is allowed to
>> execute dfs commands, user-defined functions and shell commands, it is
>> possible to bypass the client security checks.
>>
>> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>> ndard+Based+Hive+Authorization
>>
>> The above problem is the reason Hive introduced a new authorization API
>> called authz2. However, I saw that some of those limitations are handled by
>> Sentry already, such as GRANT privilege checks (on the Sentry server side).
>> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
>> execution instead of using the authz1 API that Hive provides.
>>
>> Authz1 uses the following configurations:
>>
>>
>>
>>
>>
>> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <co...@apache.org>
>> wrote:
>>
>>> Hi Sergio,
>>>
>>> Could you give some background as to what the differences are between
>>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>>
>>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
>>> Hive 2, so I assumed the separate bindings were related to the Hive
>>> versions being supported. Obviously this is not the case if we are still
>>> talking about supporting authz1 with Hive 2.0.
>>>
>>> Colm.
>>>
>>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <se...@cloudera.com>
>>> wrote:
>>>
>>> > Hi All,
>>> >
>>> > We are running into some problems with the support of Hive Authz V2
>>> > especially related to the workaround that parses Hive command strings
>>> in
>>> > Sentry using regular expressions to get some info that Hive is not
>>> sending
>>> > through the authz2 api. Hive 2.0 made some changes on commands that
>>> caused
>>> > issues with Sentry. These are fixed but the concern of doing this SQL
>>> > parsing exists. We asked the Hive community to give us extra SQL
>>> > information, but we cannot implement them in Sentry until a Hive
>>> release is
>>> > done. There are some concerns about the quality of authz2 too, such as
>>> > create/drop table and functions calling Sentry twice for authorization
>>> and
>>> > the lack of testing being done on it.
>>> >
>>> > The original idea for Sentry 2.0 (future release) was to drop authz1
>>> > support and use authz2 as default but the work is getting delayed until
>>> > Hive releases something. Now that we bumped the Hive version to 2.0, I
>>> was
>>> > wondering if we should continue with authz1 and keep authz2 as an
>>> > experimental support until Hive releases something we can consume to
>>> fix
>>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>>> remove
>>> > it in a major version.
>>> >
>>> > I was thinking if we remove any hive-authz2 profile and just add the
>>> > hive-authz2 classes to the current sentry-binding-hive module so that
>>> users
>>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>>> tests,
>>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>>> > quality of it.
>>> >
>>> > What does the PMC community think? Is it a good or bad idea?
>>> >
>>> > - Sergio
>>> >
>>>
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Concerns about Hive authz2 support

Posted by Sergio Pena <se...@cloudera.com>.
Sure.

First, here's what Hive Wiki says about authz1 limitations:

The default authorization in Hive
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
 is *not designed* with the intent to protect against malicious users
accessing data they should not be accessing. It only helps in preventing
users from accidentally doing operations they are not supposed to do. It is
also incomplete because it does not have authorization checks for many
operations including the grant statement. The authorization checks happen
during Hive query compilation. But as the user is allowed to
execute dfs commands, user-defined functions and shell commands, it is
possible to bypass the client security checks.

See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
ndard+Based+Hive+Authorization

The above problem is the reason Hive introduced a new authorization API
called authz2. However, I saw that some of those limitations are handled by
Sentry already, such as GRANT privilege checks (on the Sentry server side).
Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
execution instead of using the authz1 API that Hive provides.

Authz1 uses the following configurations:

   - *hive.security.authorization.ma
   <http://hive.security.authorization.ma>nager*=(implementation of
   HiveAuthorizerFactory)


   - *hive.security.authorization.enabled*=true


There is, though, a HiveAuthorizerFactory implementation on Sentry when
bumping the version to Hive 0.13, but it does not provide the Controller
nor Validator classes to handle authorization for v1. These classes were
introduced in Sentry later to support authz2.

Based on the above, I think that Sentry does not use authz1 completely as
it provides the hooks necessary to the semantic analyzer and task execution
to provide that support (correct me if I'm wrong).

Nevertheless, the authz2 provides other functionalities that it would be
good to support, such as DFS commands authorization and keep HMS client
filtering, GRANT/REVOKE executions and privileges checks in one class
(HiveAuthorizerFactory) instead of 3 that Sentry provides.

Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it was
Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).

Another proposal is to keep authz1 as default for Sentry 2.0 like Sentry
1.8 provides, and deprecate it later in the Sentry 2.x line once authz2 is
stable and we bump to newer versions of Hive 2 with fixes on this.

The configuration difference is:

*Sentry authz1:*
HMS
  MetastoreAuthzBinding for HMS server authorization.

HS2
  HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
  SentryMetaStoreFilterHook for hms client filtering.
  SentryHiveAuthorizationTaskFactoryImpl that creates the
SentryGrantRevokeTask.
  HiveAuthzBindingHook that checks privileges during the semantic analyzer.
  SentryGrantRevokeTask that executes adds/removes privileges on the Sentry
server.

*Sentry authz2:*
HMS
  MetastoreAuthzBinding for HMS server authorization.

HS2
  HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
  SentryHiveAuthorizer that calls a Controller or Validator depending on
the authorization request.
  SentryHiveAccessController to grant/revoke commands.
  SentryAuthorizationValidator for HMS client filtering and check
privileges.


- Sergio

On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <se...@cloudera.com>
wrote:

> Sure.
>
> First, here's what Hive Wiki says about authz1 limitations:
>
> The default authorization in Hive
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>  is *not designed* with the intent to protect against malicious users
> accessing data they should not be accessing. It only helps in preventing
> users from accidentally doing operations they are not supposed to do. It is
> also incomplete because it does not have authorization checks for many
> operations including the grant statement. The authorization checks happen
> during Hive query compilation. But as the user is allowed to
> execute dfs commands, user-defined functions and shell commands, it is
> possible to bypass the client security checks.
>
> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> ndard+Based+Hive+Authorization
>
> The above problem is the reason Hive introduced a new authorization API
> called authz2. However, I saw that some of those limitations are handled by
> Sentry already, such as GRANT privilege checks (on the Sentry server side).
> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
> execution instead of using the authz1 API that Hive provides.
>
> Authz1 uses the following configurations:
>
>
>
>
>
> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
>> Hi Sergio,
>>
>> Could you give some background as to what the differences are between
>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>
>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
>> Hive 2, so I assumed the separate bindings were related to the Hive
>> versions being supported. Obviously this is not the case if we are still
>> talking about supporting authz1 with Hive 2.0.
>>
>> Colm.
>>
>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <se...@cloudera.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > We are running into some problems with the support of Hive Authz V2
>> > especially related to the workaround that parses Hive command strings in
>> > Sentry using regular expressions to get some info that Hive is not
>> sending
>> > through the authz2 api. Hive 2.0 made some changes on commands that
>> caused
>> > issues with Sentry. These are fixed but the concern of doing this SQL
>> > parsing exists. We asked the Hive community to give us extra SQL
>> > information, but we cannot implement them in Sentry until a Hive
>> release is
>> > done. There are some concerns about the quality of authz2 too, such as
>> > create/drop table and functions calling Sentry twice for authorization
>> and
>> > the lack of testing being done on it.
>> >
>> > The original idea for Sentry 2.0 (future release) was to drop authz1
>> > support and use authz2 as default but the work is getting delayed until
>> > Hive releases something. Now that we bumped the Hive version to 2.0, I
>> was
>> > wondering if we should continue with authz1 and keep authz2 as an
>> > experimental support until Hive releases something we can consume to fix
>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>> remove
>> > it in a major version.
>> >
>> > I was thinking if we remove any hive-authz2 profile and just add the
>> > hive-authz2 classes to the current sentry-binding-hive module so that
>> users
>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>> tests,
>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>> > quality of it.
>> >
>> > What does the PMC community think? Is it a good or bad idea?
>> >
>> > - Sergio
>> >
>>
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>

Re: Concerns about Hive authz2 support

Posted by Sergio Pena <se...@cloudera.com>.
Colm, I haven't finished the last email, I pressed a button that sent the
partial email :), sorry. I'm still writing.

On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <se...@cloudera.com>
wrote:

> Sure.
>
> First, here's what Hive Wiki says about authz1 limitations:
>
> The default authorization in Hive
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>  is *not designed* with the intent to protect against malicious users
> accessing data they should not be accessing. It only helps in preventing
> users from accidentally doing operations they are not supposed to do. It is
> also incomplete because it does not have authorization checks for many
> operations including the grant statement. The authorization checks happen
> during Hive query compilation. But as the user is allowed to
> execute dfs commands, user-defined functions and shell commands, it is
> possible to bypass the client security checks.
>
> See https://cwiki.apache.org/confluence/display/Hive/SQL+
> Standard+Based+Hive+Authorization
>
> The above problem is the reason Hive introduced a new authorization API
> called authz2. However, I saw that some of those limitations are handled by
> Sentry already, such as GRANT privilege checks (on the Sentry server side).
> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
> execution instead of using the authz1 API that Hive provides.
>
> Authz1 uses the following configurations:
>
>
>
>
>
> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <co...@apache.org>
> wrote:
>
>> Hi Sergio,
>>
>> Could you give some background as to what the differences are between
>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>
>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
>> Hive 2, so I assumed the separate bindings were related to the Hive
>> versions being supported. Obviously this is not the case if we are still
>> talking about supporting authz1 with Hive 2.0.
>>
>> Colm.
>>
>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <se...@cloudera.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > We are running into some problems with the support of Hive Authz V2
>> > especially related to the workaround that parses Hive command strings in
>> > Sentry using regular expressions to get some info that Hive is not
>> sending
>> > through the authz2 api. Hive 2.0 made some changes on commands that
>> caused
>> > issues with Sentry. These are fixed but the concern of doing this SQL
>> > parsing exists. We asked the Hive community to give us extra SQL
>> > information, but we cannot implement them in Sentry until a Hive
>> release is
>> > done. There are some concerns about the quality of authz2 too, such as
>> > create/drop table and functions calling Sentry twice for authorization
>> and
>> > the lack of testing being done on it.
>> >
>> > The original idea for Sentry 2.0 (future release) was to drop authz1
>> > support and use authz2 as default but the work is getting delayed until
>> > Hive releases something. Now that we bumped the Hive version to 2.0, I
>> was
>> > wondering if we should continue with authz1 and keep authz2 as an
>> > experimental support until Hive releases something we can consume to fix
>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>> remove
>> > it in a major version.
>> >
>> > I was thinking if we remove any hive-authz2 profile and just add the
>> > hive-authz2 classes to the current sentry-binding-hive module so that
>> users
>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>> tests,
>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>> > quality of it.
>> >
>> > What does the PMC community think? Is it a good or bad idea?
>> >
>> > - Sergio
>> >
>>
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>

Re: Concerns about Hive authz2 support

Posted by Sergio Pena <se...@cloudera.com>.
Sure.

First, here's what Hive Wiki says about authz1 limitations:

The default authorization in Hive
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
 is *not designed* with the intent to protect against malicious users
accessing data they should not be accessing. It only helps in preventing
users from accidentally doing operations they are not supposed to do. It is
also incomplete because it does not have authorization checks for many
operations including the grant statement. The authorization checks happen
during Hive query compilation. But as the user is allowed to
execute dfs commands, user-defined functions and shell commands, it is
possible to bypass the client security checks.

See
https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization

The above problem is the reason Hive introduced a new authorization API
called authz2. However, I saw that some of those limitations are handled by
Sentry already, such as GRANT privilege checks (on the Sentry server side).
Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
execution instead of using the authz1 API that Hive provides.

Authz1 uses the following configurations:





On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <co...@apache.org>
wrote:

> Hi Sergio,
>
> Could you give some background as to what the differences are between
> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>
> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
> Hive 2, so I assumed the separate bindings were related to the Hive
> versions being supported. Obviously this is not the case if we are still
> talking about supporting authz1 with Hive 2.0.
>
> Colm.
>
> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <se...@cloudera.com>
> wrote:
>
> > Hi All,
> >
> > We are running into some problems with the support of Hive Authz V2
> > especially related to the workaround that parses Hive command strings in
> > Sentry using regular expressions to get some info that Hive is not
> sending
> > through the authz2 api. Hive 2.0 made some changes on commands that
> caused
> > issues with Sentry. These are fixed but the concern of doing this SQL
> > parsing exists. We asked the Hive community to give us extra SQL
> > information, but we cannot implement them in Sentry until a Hive release
> is
> > done. There are some concerns about the quality of authz2 too, such as
> > create/drop table and functions calling Sentry twice for authorization
> and
> > the lack of testing being done on it.
> >
> > The original idea for Sentry 2.0 (future release) was to drop authz1
> > support and use authz2 as default but the work is getting delayed until
> > Hive releases something. Now that we bumped the Hive version to 2.0, I
> was
> > wondering if we should continue with authz1 and keep authz2 as an
> > experimental support until Hive releases something we can consume to fix
> > our issues. Then we can deprecate authz1 in a future 2.x release and
> remove
> > it in a major version.
> >
> > I was thinking if we remove any hive-authz2 profile and just add the
> > hive-authz2 classes to the current sentry-binding-hive module so that
> users
> > are allowed to switch either to v1 or v2 (for testing). Also for the
> tests,
> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
> > quality of it.
> >
> > What does the PMC community think? Is it a good or bad idea?
> >
> > - Sergio
> >
>
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>

Re: Concerns about Hive authz2 support

Posted by Colm O hEigeartaigh <co...@apache.org>.
Hi Sergio,

Could you give some background as to what the differences are between
"authz1" and "authz2"? Sorry if this is an obvious question :-)

For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
Hive 2, so I assumed the separate bindings were related to the Hive
versions being supported. Obviously this is not the case if we are still
talking about supporting authz1 with Hive 2.0.

Colm.

On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <se...@cloudera.com>
wrote:

> Hi All,
>
> We are running into some problems with the support of Hive Authz V2
> especially related to the workaround that parses Hive command strings in
> Sentry using regular expressions to get some info that Hive is not sending
> through the authz2 api. Hive 2.0 made some changes on commands that caused
> issues with Sentry. These are fixed but the concern of doing this SQL
> parsing exists. We asked the Hive community to give us extra SQL
> information, but we cannot implement them in Sentry until a Hive release is
> done. There are some concerns about the quality of authz2 too, such as
> create/drop table and functions calling Sentry twice for authorization and
> the lack of testing being done on it.
>
> The original idea for Sentry 2.0 (future release) was to drop authz1
> support and use authz2 as default but the work is getting delayed until
> Hive releases something. Now that we bumped the Hive version to 2.0, I was
> wondering if we should continue with authz1 and keep authz2 as an
> experimental support until Hive releases something we can consume to fix
> our issues. Then we can deprecate authz1 in a future 2.x release and remove
> it in a major version.
>
> I was thinking if we remove any hive-authz2 profile and just add the
> hive-authz2 classes to the current sentry-binding-hive module so that users
> are allowed to switch either to v1 or v2 (for testing). Also for the tests,
> find a way to run all sentry-tests-hive with v1 and v2 to validate the
> quality of it.
>
> What does the PMC community think? Is it a good or bad idea?
>
> - Sergio
>



-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com