You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sentry.apache.org by Sergio Pena <se...@cloudera.com.INVALID> on 2018/09/04 20:09:18 UTC

Re: [DISCUSS] How to move to Hive 3.x and Hive 4.x support on Sentry?

I find a problem of having different profiles. Let's say we want to support
Hive2, Hive3 and Hive4 once it is released; but we also want to support
Hadoop3 and Hadoop2 (for compatibility with Hive2); what versions are we
going to use for building the jars and releasing Sentry? With profiles we
can only choose 1 Hive and 1 Hadoop (what about other components, such as
Kafka and Solr?)

The above will become a nightmare if we want to follow the approach of
keeping Sentry compatible with several versions.

I was looking into how SLF4J works and it looks like a good idea to
replicate on Sentry.  SLF4J is just a facade and abstract API for logging
that uses any logging binding implementation that is available in the
classpath, such as Log4j, JDK Logging, etc. It initially uses the desired
logging implementation at compile-time, but you can replace any
implementation jars at runtime.

For Sentry, we could decide to use Hive3 (latest release) and Hadoop3 at
compile-time and work with the abstracted API to interact with them. But if
a user wants to use Hive2 and Hadoop2, then they can replace the
Hive3/Hadoop3 jars with the 2.x without recompile anything and restart
Sentry. This would work pretty nice. We should do some compatibility
testing between those versions, though, to make sure we don't break
compatibility (some automated tests should be in place for this).

It sounds like a good amount of work to make this, but what do you think
about it? We could use profiles just for the sake of compiling a jar with a
specific version, but what about for releasing binaries?

- Sergio

On Tue, Aug 28, 2018 at 11:12 AM Stephen Moist <mo...@cloudera.com.invalid>
wrote:

> I don’t think we should do multiple profiles.  I think we should just
> support Hive through hive-binding-v2, hive-binding-v3, and hive-binding-v4
> and have them be their own maven modules.  Customers can decide which
> version they want.  We can add Thrift calls that are specific for a binding
> or amend them with optional parameters.  With this, do we need to support
> previous minor versions or just the latest version?
>
> > On Aug 27, 2018, at 1:34 PM, Brian Towles <bt...@cloudera.com.INVALID>
> wrote:
> >
> > It would seem that there shouldn't be an issue doing a different binding
> > for Hive 3 as well as hive 4 if there is that much difference.  It would
> > allow the Hive 2 one to be maintained and could eventually be EOL'd once
> it
> > becomes not useful.  This sort of pattern can be used across multiple
> > version of almost anything there is a Sentry binding for.
> >
> > On Mon, Aug 27, 2018 at 11:09 AM Kalyan Kumar Kalvagadda
> > <kk...@cloudera.com.invalid> wrote:
> >
> >> Sergio,
> >>
> >> We could add profiles for to multiple version of Hive till the point we
> >> want to support multiple versions. We should also consider supporting
> >> Hadoop-3 as well.
> >> We could take similar approach to support multiple versions of Hadoop as
> >> well.
> >>
> >>
> >> On Mon, Aug 27, 2018 at 10:22 AM Sergio Pena
> >> <se...@cloudera.com.invalid> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I'd like to discuss how we can start integrating Apache Hive 3.x and
> >> future
> >>> 4.x into our Sentry versions. I've noticed that Hive has released two
> >>> versions of 3.x this year (3.0 on May/18, 3.1 on Jul/18), and the
> current
> >>> version development is 4.0.
> >>>
> >>> Currently, Sentry supports only the Hive 2.x, and there is no news of
> new
> >>> minor versions on that line to be released in the community. However,
> >> there
> >>> are still companies using the Hive 2.x version due to its stability and
> >> API
> >>> compatibility. Also, I see a lot of incompatibilities problems once
> Hive
> >>> 4.0 is released too.
> >>>
> >>> So, the question, how can we keep with the most updated Hive releases
> but
> >>> being compatible with Hive versions that are still active by the Sentry
> >>> community? Btw, Hive 3.x has features that allow Sentry support ALTER
> >> TABLE
> >>> SET OWNER commands to transfer object ownership.
> >>>
> >>> Hive had a shims interface for Hadoop to support Hadoop 1.x and Hadoop
> >> 2.x
> >>> in the past. Should we use a similar thing?
> >>>
> >>> Any other ideas?
> >>>
> >>> - Sergio
> >>>
> >>
> > --
> > *Brian Towles* | Software Engineer
> > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> jond@cloudera.com>
> > cloudera.com <http://www.cloudera.com/>
> >
> > [image: Cloudera] <http://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
>
>

Re: [DISCUSS] How to move to Hive 3.x and Hive 4.x support on Sentry?

Posted by Kalyan Kumar Kalvagadda <kk...@cloudera.com.INVALID>.

Sergio,

What do you mean by "work with the abstracted API"?

I see couple of issues here

   1. This approach works only if there are are no functional code changes
   needed to integrate with Hive2 and Hadoop3, right.
   2. How about transitive dependencies? For example: Hive2 and Hive1 might
   have different dependencies. Just replacing Hive-2 jars with Hive-1 jars at
   run time might fail because of missing jars that Hive-1 needs.
   3. How do we make sure that the code changes made in sentry repo work
   with older versions of Hive/hadoop if the unit tests are only run with
   latest versions of Hive/Hadoop?


-Kalyan