You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by Grant Henke <gh...@cloudera.com.INVALID> on 2020/03/10 15:59:52 UTC

Re: Hive 3 and Apache Ranger

Hello Kudu developers,

With a large majority of the Apache Ranger integration work landing, I
wanted to revisit this plan and potentially move forward removing Sentry
and upgrading Hive.

Steps 1 through 3 from the plan have been completed. Below are some related
commits:

   1. commit the Hive 3 preparation patch to simplify upgrading in the
   future
      - https://github.com/apache/kudu/commit/76b80e
      - https://github.com/apache/kudu/commit/ab69837
      - https://github.com/apache/kudu/commit/e20487a
      - https://github.com/apache/kudu/commit/d96f8fc
      - https://github.com/apache/kudu/commit/8fb170b
   2. Verify the feasibility of upgrading with the mentioned POC patches,
   but do not commit them.
      - https://gerrit.cloudera.org/#/c/14020/
      - https://gerrit.cloudera.org/#/c/13256/
      3. Start work on an Apache Ranger integration for Kudu.
      - https://github.com/apache/kudu/commit/f392503
      - https://github.com/apache/kudu/commit/2ea8478
      - https://github.com/apache/kudu/commit/e13fd4a
      - https://github.com/apache/kudu/commit/0d29977


Since the writing of the original email, Sentry has not added support for
Hive 3. This means that step 4 is not an option at this time. As a result I
propose to move forward with step 5 and start removing Sentry 3 support and
upgrading to Hive 3.

The planned steps are as follows:

   1. Rebase and commit the patch to disable Sentry tests
   2. Rebase and commit the patch to upgrade to upgrade to Hive 3
   3. Document the removal of Sentry support in the release notes.
   4. Remove Sentry tests and code.
   - This can be done after the next release.
      - Though the tests are removed and we use Hive 3 we still technically
      work with Hive 2 and Sentry. We are no longer testing/validating
it though.

Please let me know if you have any thoughts or feedback on the above plan.

Thank you,
Grant


On Tue, Aug 13, 2019 at 3:06 AM Adar Lieber-Dembo <ad...@cloudera.com.invalid>
wrote:

> +1, thanks for all of the details.
>
> On Fri, Aug 9, 2019 at 3:21 PM Grant Henke <gh...@cloudera.com> wrote:
> >
> > Hello Kudu developers,
> >
> > Recently I have started work on upgrading Kudu to use Apache Hive 3.x.
> > Given this is a major upgrade it does come with some challenges. As of
> Kudu
> > 1.10.0 we use Hive in the HMS synchronization feature. This feature
> > includes a Kudu server side notification listener and HMS client. It also
> > includes a Java side HMS plugin to enforce Kudu imperatives within the
> HMS.
> > That feature is useful on its own in many ways, but is also required for
> > fine grained authorization via Apache Sentry.
> >
> > The primary challenge is that Apache Sentry currently does not support
> Hive
> > 3 and it will likely take a large effort to enable support. It is also
> > unclear if there is anyone in the Sentry community that want's to
> > contribute and release such support.
> >
> > I have started preliminary efforts to support Hive 3 in Kudu and the HMS
> > synchronization feature. This includes 3 patches. The first patch
> > <https://gerrit.cloudera.org/#/c/14018/> is changes that work in both
> Hive
> > 2 and Hive 3 that minimize the work needed when we upgrade in the future.
> > This can be committed to master when reviewed and ready. The second patch
> > <https://gerrit.cloudera.org/#/c/14006> disables the sentry integration
> so
> > I can test the changes required to support HMS synchronization on its
> own.
> > Those changes and testing are the third patch
> > <https://gerrit.cloudera.org/#/c/13256/>.
> >
> > Given fine grained authorization is a critical feature for many users, we
> > can't remove Sentry support without providing an alternative
> authorization
> > implementation. At the same time we have started work on authorization
> via
> > Apache Ranger. Once that implementation exists and has been
> > contributed/released we can make a decision about how to move forward.
> >
> > Given what we know today and the current situation here is my suggested
> > plan:
> >
> >    1. Commit the Hive 3 preparation patch to simplify upgrading in the
> >    future
> >    2. Verify the feasibility of upgrading with the mentioned POC patches,
> >    but do not commit them.
> >       - This means we will remain on Hive 2 until step 4 or 5 below.
> >    3. Start work on an Apache Ranger integration for Kudu.
> >    4. If Hive 3 support is added in Sentry, consider upgrading to Hive 3
> >    then.
> >    5. When Ranger support is complete, consider removing Sentry support
> in
> >    favor of Ranger and upgrade to Hive 3.
> >       - This may require a migration path from Sentry to Ranger.
> >
> > Please let me know if you have any thoughts or feedback on the above
> plan.
> >
> > Thank you,
> > Grant
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Hive 3 and Apache Ranger

Posted by Grant Henke <gh...@cloudera.com.INVALID>.
>
> though I appreciate there's a counterbalancing tension here (i.e. folks
> that want more confidence
> around Kudu's Hive 3 support in this release).


Yes this is the main motivation for doing some of the work before the
release. Given we don't expect new Sentry changes in this release, I
thought it would be safe to disable the tests and upgrade Hive while still
maintaining Sentry support. Then immediately after release work on actual
removal.






On Tue, Mar 10, 2020 at 11:59 AM Adar Lieber-Dembo
<ad...@cloudera.com.invalid> wrote:

> I agree with the broad strokes of the plan, but since the Ranger
> support is still quite new, I'd prefer if we did one release
> supporting both Sentry and Ranger, and then removed Sentry in the next
> release, at which point we'd expect the Ranger support to be more
> robust. To me, that means deferring all the remaining work you've
> outlined to the next release, though I appreciate there's a
> counterbalancing tension here (i.e. folks that want more confidence
> around Kudu's Hive 3 support in this release).
>
> On Tue, Mar 10, 2020 at 9:33 AM Hao Hao <ha...@cloudera.com.invalid>
> wrote:
> >
> > The plan looks good to me, thanks a lot Grant for the proposal!
> >
> > Best,
> > Hao
> >
> > On Tue, Mar 10, 2020 at 9:00 AM Grant Henke <ghenke@cloudera.com.invalid
> >
> > wrote:
> >
> > > Hello Kudu developers,
> > >
> > > With a large majority of the Apache Ranger integration work landing, I
> > > wanted to revisit this plan and potentially move forward removing
> Sentry
> > > and upgrading Hive.
> > >
> > > Steps 1 through 3 from the plan have been completed. Below are some
> related
> > > commits:
> > >
> > >    1. commit the Hive 3 preparation patch to simplify upgrading in the
> > >    future
> > >       - https://github.com/apache/kudu/commit/76b80e
> > >       - https://github.com/apache/kudu/commit/ab69837
> > >       - https://github.com/apache/kudu/commit/e20487a
> > >       - https://github.com/apache/kudu/commit/d96f8fc
> > >       - https://github.com/apache/kudu/commit/8fb170b
> > >    2. Verify the feasibility of upgrading with the mentioned POC
> patches,
> > >    but do not commit them.
> > >       - https://gerrit.cloudera.org/#/c/14020/
> > >       - https://gerrit.cloudera.org/#/c/13256/
> > >       3. Start work on an Apache Ranger integration for Kudu.
> > >       - https://github.com/apache/kudu/commit/f392503
> > >       - https://github.com/apache/kudu/commit/2ea8478
> > >       - https://github.com/apache/kudu/commit/e13fd4a
> > >       - https://github.com/apache/kudu/commit/0d29977
> > >
> > >
> > > Since the writing of the original email, Sentry has not added support
> for
> > > Hive 3. This means that step 4 is not an option at this time. As a
> result I
> > > propose to move forward with step 5 and start removing Sentry 3
> support and
> > > upgrading to Hive 3.
> > >
> > > The planned steps are as follows:
> > >
> > >    1. Rebase and commit the patch to disable Sentry tests
> > >    2. Rebase and commit the patch to upgrade to upgrade to Hive 3
> > >    3. Document the removal of Sentry support in the release notes.
> > >    4. Remove Sentry tests and code.
> > >    - This can be done after the next release.
> > >       - Though the tests are removed and we use Hive 3 we still
> technically
> > >       work with Hive 2 and Sentry. We are no longer testing/validating
> > > it though.
> > >
> > > Please let me know if you have any thoughts or feedback on the above
> plan.
> > >
> > > Thank you,
> > > Grant
> > >
> > >
> > > On Tue, Aug 13, 2019 at 3:06 AM Adar Lieber-Dembo
> > > <ad...@cloudera.com.invalid>
> > > wrote:
> > >
> > > > +1, thanks for all of the details.
> > > >
> > > > On Fri, Aug 9, 2019 at 3:21 PM Grant Henke <gh...@cloudera.com>
> wrote:
> > > > >
> > > > > Hello Kudu developers,
> > > > >
> > > > > Recently I have started work on upgrading Kudu to use Apache Hive
> 3.x.
> > > > > Given this is a major upgrade it does come with some challenges.
> As of
> > > > Kudu
> > > > > 1.10.0 we use Hive in the HMS synchronization feature. This feature
> > > > > includes a Kudu server side notification listener and HMS client.
> It
> > > also
> > > > > includes a Java side HMS plugin to enforce Kudu imperatives within
> the
> > > > HMS.
> > > > > That feature is useful on its own in many ways, but is also
> required
> > > for
> > > > > fine grained authorization via Apache Sentry.
> > > > >
> > > > > The primary challenge is that Apache Sentry currently does not
> support
> > > > Hive
> > > > > 3 and it will likely take a large effort to enable support. It is
> also
> > > > > unclear if there is anyone in the Sentry community that want's to
> > > > > contribute and release such support.
> > > > >
> > > > > I have started preliminary efforts to support Hive 3 in Kudu and
> the
> > > HMS
> > > > > synchronization feature. This includes 3 patches. The first patch
> > > > > <https://gerrit.cloudera.org/#/c/14018/> is changes that work in
> both
> > > > Hive
> > > > > 2 and Hive 3 that minimize the work needed when we upgrade in the
> > > future.
> > > > > This can be committed to master when reviewed and ready. The second
> > > patch
> > > > > <https://gerrit.cloudera.org/#/c/14006> disables the sentry
> > > integration
> > > > so
> > > > > I can test the changes required to support HMS synchronization on
> its
> > > > own.
> > > > > Those changes and testing are the third patch
> > > > > <https://gerrit.cloudera.org/#/c/13256/>.
> > > > >
> > > > > Given fine grained authorization is a critical feature for many
> users,
> > > we
> > > > > can't remove Sentry support without providing an alternative
> > > > authorization
> > > > > implementation. At the same time we have started work on
> authorization
> > > > via
> > > > > Apache Ranger. Once that implementation exists and has been
> > > > > contributed/released we can make a decision about how to move
> forward.
> > > > >
> > > > > Given what we know today and the current situation here is my
> suggested
> > > > > plan:
> > > > >
> > > > >    1. Commit the Hive 3 preparation patch to simplify upgrading in
> the
> > > > >    future
> > > > >    2. Verify the feasibility of upgrading with the mentioned POC
> > > patches,
> > > > >    but do not commit them.
> > > > >       - This means we will remain on Hive 2 until step 4 or 5
> below.
> > > > >    3. Start work on an Apache Ranger integration for Kudu.
> > > > >    4. If Hive 3 support is added in Sentry, consider upgrading to
> Hive
> > > 3
> > > > >    then.
> > > > >    5. When Ranger support is complete, consider removing Sentry
> support
> > > > in
> > > > >    favor of Ranger and upgrade to Hive 3.
> > > > >       - This may require a migration path from Sentry to Ranger.
> > > > >
> > > > > Please let me know if you have any thoughts or feedback on the
> above
> > > > plan.
> > > > >
> > > > > Thank you,
> > > > > Grant
> > > > > --
> > > > > Grant Henke
> > > > > Software Engineer | Cloudera
> > > > > grant@cloudera.com | twitter.com/gchenke |
> linkedin.com/in/granthenke
> > > >
> > >
> > >
> > > --
> > > Grant Henke
> > > Software Engineer | Cloudera
> > > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> > >
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Hive 3 and Apache Ranger

Posted by Adar Lieber-Dembo <ad...@cloudera.com.INVALID>.
I agree with the broad strokes of the plan, but since the Ranger
support is still quite new, I'd prefer if we did one release
supporting both Sentry and Ranger, and then removed Sentry in the next
release, at which point we'd expect the Ranger support to be more
robust. To me, that means deferring all the remaining work you've
outlined to the next release, though I appreciate there's a
counterbalancing tension here (i.e. folks that want more confidence
around Kudu's Hive 3 support in this release).

On Tue, Mar 10, 2020 at 9:33 AM Hao Hao <ha...@cloudera.com.invalid> wrote:
>
> The plan looks good to me, thanks a lot Grant for the proposal!
>
> Best,
> Hao
>
> On Tue, Mar 10, 2020 at 9:00 AM Grant Henke <gh...@cloudera.com.invalid>
> wrote:
>
> > Hello Kudu developers,
> >
> > With a large majority of the Apache Ranger integration work landing, I
> > wanted to revisit this plan and potentially move forward removing Sentry
> > and upgrading Hive.
> >
> > Steps 1 through 3 from the plan have been completed. Below are some related
> > commits:
> >
> >    1. commit the Hive 3 preparation patch to simplify upgrading in the
> >    future
> >       - https://github.com/apache/kudu/commit/76b80e
> >       - https://github.com/apache/kudu/commit/ab69837
> >       - https://github.com/apache/kudu/commit/e20487a
> >       - https://github.com/apache/kudu/commit/d96f8fc
> >       - https://github.com/apache/kudu/commit/8fb170b
> >    2. Verify the feasibility of upgrading with the mentioned POC patches,
> >    but do not commit them.
> >       - https://gerrit.cloudera.org/#/c/14020/
> >       - https://gerrit.cloudera.org/#/c/13256/
> >       3. Start work on an Apache Ranger integration for Kudu.
> >       - https://github.com/apache/kudu/commit/f392503
> >       - https://github.com/apache/kudu/commit/2ea8478
> >       - https://github.com/apache/kudu/commit/e13fd4a
> >       - https://github.com/apache/kudu/commit/0d29977
> >
> >
> > Since the writing of the original email, Sentry has not added support for
> > Hive 3. This means that step 4 is not an option at this time. As a result I
> > propose to move forward with step 5 and start removing Sentry 3 support and
> > upgrading to Hive 3.
> >
> > The planned steps are as follows:
> >
> >    1. Rebase and commit the patch to disable Sentry tests
> >    2. Rebase and commit the patch to upgrade to upgrade to Hive 3
> >    3. Document the removal of Sentry support in the release notes.
> >    4. Remove Sentry tests and code.
> >    - This can be done after the next release.
> >       - Though the tests are removed and we use Hive 3 we still technically
> >       work with Hive 2 and Sentry. We are no longer testing/validating
> > it though.
> >
> > Please let me know if you have any thoughts or feedback on the above plan.
> >
> > Thank you,
> > Grant
> >
> >
> > On Tue, Aug 13, 2019 at 3:06 AM Adar Lieber-Dembo
> > <ad...@cloudera.com.invalid>
> > wrote:
> >
> > > +1, thanks for all of the details.
> > >
> > > On Fri, Aug 9, 2019 at 3:21 PM Grant Henke <gh...@cloudera.com> wrote:
> > > >
> > > > Hello Kudu developers,
> > > >
> > > > Recently I have started work on upgrading Kudu to use Apache Hive 3.x.
> > > > Given this is a major upgrade it does come with some challenges. As of
> > > Kudu
> > > > 1.10.0 we use Hive in the HMS synchronization feature. This feature
> > > > includes a Kudu server side notification listener and HMS client. It
> > also
> > > > includes a Java side HMS plugin to enforce Kudu imperatives within the
> > > HMS.
> > > > That feature is useful on its own in many ways, but is also required
> > for
> > > > fine grained authorization via Apache Sentry.
> > > >
> > > > The primary challenge is that Apache Sentry currently does not support
> > > Hive
> > > > 3 and it will likely take a large effort to enable support. It is also
> > > > unclear if there is anyone in the Sentry community that want's to
> > > > contribute and release such support.
> > > >
> > > > I have started preliminary efforts to support Hive 3 in Kudu and the
> > HMS
> > > > synchronization feature. This includes 3 patches. The first patch
> > > > <https://gerrit.cloudera.org/#/c/14018/> is changes that work in both
> > > Hive
> > > > 2 and Hive 3 that minimize the work needed when we upgrade in the
> > future.
> > > > This can be committed to master when reviewed and ready. The second
> > patch
> > > > <https://gerrit.cloudera.org/#/c/14006> disables the sentry
> > integration
> > > so
> > > > I can test the changes required to support HMS synchronization on its
> > > own.
> > > > Those changes and testing are the third patch
> > > > <https://gerrit.cloudera.org/#/c/13256/>.
> > > >
> > > > Given fine grained authorization is a critical feature for many users,
> > we
> > > > can't remove Sentry support without providing an alternative
> > > authorization
> > > > implementation. At the same time we have started work on authorization
> > > via
> > > > Apache Ranger. Once that implementation exists and has been
> > > > contributed/released we can make a decision about how to move forward.
> > > >
> > > > Given what we know today and the current situation here is my suggested
> > > > plan:
> > > >
> > > >    1. Commit the Hive 3 preparation patch to simplify upgrading in the
> > > >    future
> > > >    2. Verify the feasibility of upgrading with the mentioned POC
> > patches,
> > > >    but do not commit them.
> > > >       - This means we will remain on Hive 2 until step 4 or 5 below.
> > > >    3. Start work on an Apache Ranger integration for Kudu.
> > > >    4. If Hive 3 support is added in Sentry, consider upgrading to Hive
> > 3
> > > >    then.
> > > >    5. When Ranger support is complete, consider removing Sentry support
> > > in
> > > >    favor of Ranger and upgrade to Hive 3.
> > > >       - This may require a migration path from Sentry to Ranger.
> > > >
> > > > Please let me know if you have any thoughts or feedback on the above
> > > plan.
> > > >
> > > > Thank you,
> > > > Grant
> > > > --
> > > > Grant Henke
> > > > Software Engineer | Cloudera
> > > > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> > >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >

Re: Hive 3 and Apache Ranger

Posted by Hao Hao <ha...@cloudera.com.INVALID>.
The plan looks good to me, thanks a lot Grant for the proposal!

Best,
Hao

On Tue, Mar 10, 2020 at 9:00 AM Grant Henke <gh...@cloudera.com.invalid>
wrote:

> Hello Kudu developers,
>
> With a large majority of the Apache Ranger integration work landing, I
> wanted to revisit this plan and potentially move forward removing Sentry
> and upgrading Hive.
>
> Steps 1 through 3 from the plan have been completed. Below are some related
> commits:
>
>    1. commit the Hive 3 preparation patch to simplify upgrading in the
>    future
>       - https://github.com/apache/kudu/commit/76b80e
>       - https://github.com/apache/kudu/commit/ab69837
>       - https://github.com/apache/kudu/commit/e20487a
>       - https://github.com/apache/kudu/commit/d96f8fc
>       - https://github.com/apache/kudu/commit/8fb170b
>    2. Verify the feasibility of upgrading with the mentioned POC patches,
>    but do not commit them.
>       - https://gerrit.cloudera.org/#/c/14020/
>       - https://gerrit.cloudera.org/#/c/13256/
>       3. Start work on an Apache Ranger integration for Kudu.
>       - https://github.com/apache/kudu/commit/f392503
>       - https://github.com/apache/kudu/commit/2ea8478
>       - https://github.com/apache/kudu/commit/e13fd4a
>       - https://github.com/apache/kudu/commit/0d29977
>
>
> Since the writing of the original email, Sentry has not added support for
> Hive 3. This means that step 4 is not an option at this time. As a result I
> propose to move forward with step 5 and start removing Sentry 3 support and
> upgrading to Hive 3.
>
> The planned steps are as follows:
>
>    1. Rebase and commit the patch to disable Sentry tests
>    2. Rebase and commit the patch to upgrade to upgrade to Hive 3
>    3. Document the removal of Sentry support in the release notes.
>    4. Remove Sentry tests and code.
>    - This can be done after the next release.
>       - Though the tests are removed and we use Hive 3 we still technically
>       work with Hive 2 and Sentry. We are no longer testing/validating
> it though.
>
> Please let me know if you have any thoughts or feedback on the above plan.
>
> Thank you,
> Grant
>
>
> On Tue, Aug 13, 2019 at 3:06 AM Adar Lieber-Dembo
> <ad...@cloudera.com.invalid>
> wrote:
>
> > +1, thanks for all of the details.
> >
> > On Fri, Aug 9, 2019 at 3:21 PM Grant Henke <gh...@cloudera.com> wrote:
> > >
> > > Hello Kudu developers,
> > >
> > > Recently I have started work on upgrading Kudu to use Apache Hive 3.x.
> > > Given this is a major upgrade it does come with some challenges. As of
> > Kudu
> > > 1.10.0 we use Hive in the HMS synchronization feature. This feature
> > > includes a Kudu server side notification listener and HMS client. It
> also
> > > includes a Java side HMS plugin to enforce Kudu imperatives within the
> > HMS.
> > > That feature is useful on its own in many ways, but is also required
> for
> > > fine grained authorization via Apache Sentry.
> > >
> > > The primary challenge is that Apache Sentry currently does not support
> > Hive
> > > 3 and it will likely take a large effort to enable support. It is also
> > > unclear if there is anyone in the Sentry community that want's to
> > > contribute and release such support.
> > >
> > > I have started preliminary efforts to support Hive 3 in Kudu and the
> HMS
> > > synchronization feature. This includes 3 patches. The first patch
> > > <https://gerrit.cloudera.org/#/c/14018/> is changes that work in both
> > Hive
> > > 2 and Hive 3 that minimize the work needed when we upgrade in the
> future.
> > > This can be committed to master when reviewed and ready. The second
> patch
> > > <https://gerrit.cloudera.org/#/c/14006> disables the sentry
> integration
> > so
> > > I can test the changes required to support HMS synchronization on its
> > own.
> > > Those changes and testing are the third patch
> > > <https://gerrit.cloudera.org/#/c/13256/>.
> > >
> > > Given fine grained authorization is a critical feature for many users,
> we
> > > can't remove Sentry support without providing an alternative
> > authorization
> > > implementation. At the same time we have started work on authorization
> > via
> > > Apache Ranger. Once that implementation exists and has been
> > > contributed/released we can make a decision about how to move forward.
> > >
> > > Given what we know today and the current situation here is my suggested
> > > plan:
> > >
> > >    1. Commit the Hive 3 preparation patch to simplify upgrading in the
> > >    future
> > >    2. Verify the feasibility of upgrading with the mentioned POC
> patches,
> > >    but do not commit them.
> > >       - This means we will remain on Hive 2 until step 4 or 5 below.
> > >    3. Start work on an Apache Ranger integration for Kudu.
> > >    4. If Hive 3 support is added in Sentry, consider upgrading to Hive
> 3
> > >    then.
> > >    5. When Ranger support is complete, consider removing Sentry support
> > in
> > >    favor of Ranger and upgrade to Hive 3.
> > >       - This may require a migration path from Sentry to Ranger.
> > >
> > > Please let me know if you have any thoughts or feedback on the above
> > plan.
> > >
> > > Thank you,
> > > Grant
> > > --
> > > Grant Henke
> > > Software Engineer | Cloudera
> > > grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>