You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Brian Towles <bt...@cloudera.com> on 2017/11/15 16:24:14 UTC

Shading of conflicting dependencies

Howdy all,

An issue that keeps coming up seems to be the conflict of dependency
versions between Sentry and the components it is plugging into. A current
example of this impact is Google Guava with hive2 using v14 and Impala
using v11 while Sentry needs to have at least v14 in order to fix some bugs
in the BoneCP library.  These sort of conflicts come in whenever we are
embedding a Sentry plugin or using a Sentry library into another project.

I would like to propose a mechanism for offsetting some of these issues
using something similar to the third party shading that HBase uses for
common problems (such as Guava)
(https://github.com/apache/hbase-thirdparty with
documentation about it http://hbase.apache.org/book.html#thirdparty) and is
used by HBase and Hadoop for packaging of their downstream artifacts (
https://github.com/apache/hbase/blob/master/hbase-shaded/ and
https://github.com/apache/hadoop/tree/trunk/hadoop-client-modules/hadoop-client-api/
)

The main benefit of this is that it would allow Sentry to be used as
libraries and plugins with all of the dependencies needed for Sentry to be
abstracted away from components implementing it. Sentry could rev versions
of libraries easier and not have collisions of library versions needed by
the implementing component.  As well this would potentially make Sentry
downstream usage more stable since it would be used and tested against a
static set of dependencies and not using libraries based on what the
implementing component has.

On the downside, it would make the on disk size of the Sentry plugins and
libraries for downstream larger. As well, the number of classes loaded into
memory would be larger since there would be potential duplication of actual
class implementations or multiple versions of a class with different
package names in memory.  But this seems to be a common practice and the
lack of library version collisions and stability make up for these
downsides.

The third party shading works by using the Maven shade plugin to do package
name shifting of the third party library (Guava, BoneCP) to a sentry
specific package using the version on the third party library needed for
Sentry.

E.G. *com.google.common* packages could be shifted to
*org.apache.sentry.shaded.com.google.common*.

Since the Maven Shade plugin can actually change the byte-codes of the
libraries being shaded, we can do this even for dependencies that have
shared sub-dependencies .  BoneCP being the main example since it uses
Guava internal to itself.  We could shade the BoneCP into the same shared
sentry third party dependency jar and since the bytecode level
manipulation.

What this means on the development side is that we would need to reference
imports from the shaded third party

E.G. *import com.google.common.collect.Maps* becomes *import
org.apache.sentry.shaded.com.google.common.collect.Maps*.

Ive been looking at this in the context of
https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this should
potentially be something that is more of an overall Sentry standard
practice and larger scale implementation.

-=Brian


-- 
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: Shading of conflicting dependencies

Posted by Brian Towles <bt...@cloudera.com>.
@Moist
It can be found here
https://github.com/apache/hbase/blob/master/hbase-shaded/hbase-shaded-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh

On Wed, Nov 15, 2017 at 3:38 PM Brian Towles <bt...@cloudera.com> wrote:

> HBase does have a ensure-jars-have-correct-contents.sh check that they run
> which we could do something similar to.
>
> On Wed, Nov 15, 2017 at 3:30 PM Stephen Moist <mo...@cloudera.com> wrote:
>
>> Is there any way to add a check to the maven build to make sure that the
>> classes being imported are the shaded ones?
>>
>>
>> > On Nov 15, 2017, at 3:02 PM, Brian Towles <bt...@cloudera.com> wrote:
>> >
>> > I can see that.  I can look into that as well.  It would make sense for
>> the
>> > "Sentry as a Library" you talked about as well
>> >
>> > On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <ak...@cloudera.com>
>> > wrote:
>> >
>> >> Brian, There are cases where shading is useful for server as well -
>> mostly
>> >> for e2e tests which run server, HDFS, Hive all in the same JVM.
>> >>
>> >> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <bt...@cloudera.com>
>> >> wrote:
>> >>
>> >>> So there would be two different levels of shading.
>> >>>
>> >>> The first would be the known issues shared repackaged dependencies
>> like
>> >>> Guava.  This would only be for libraries that we absolutely know will
>> >> cause
>> >>> collisions with implementing services and implementing third party
>> >>> libraries that need to have the references to the trouble causing
>> >> libraries
>> >>> shifted.  These trouble causing libraries would be the only ones where
>> >> hard
>> >>> coded import path changes would need to take place for.
>> >>>
>> >>> For example a sentry-thridparty/sentry-shaded-miscellaneous module
>> would
>> >>> shared google guava shifting the package to
>> >>> org.apache.sentry.thirdparty.com.google.common. As well as it would
>> shade
>> >>> the BoneCP jar so that it picks up the shift of the Guava package but
>> not
>> >>> shifting the BoneCP packages unless we encounter and a known conflict.
>> >>>
>> >>> The second level os shading would be for the plugins distribution and
>> >> would
>> >>> shade and package shift only the dependencies needed for that plugin.
>> >> This
>> >>> would not need to have any import statement changes as the maven shade
>> >>> plugin would change the bytecode package references on packaging.  As
>> to
>> >>> Kalyan's question, the run time memory used for class loading would
>> only
>> >> be
>> >>> for those items used by the plugin, which is really not much anyways
>> >>> comparatively. The class loader doesn't load up a class until its
>> needed.
>> >>> Till that point it is just a reference to a file in a jar.  The list
>> of
>> >>> files/classes in jars is not a heavy memory load.
>> >>>
>> >>> The main server implementation would not need to have the second level
>> >>> shading done to it as it runs stand alone and is unaffected by the
>> >>> collisions issues.
>> >>>
>> >>> @Alexander For your questions, there is a small amount of refactoring
>> of
>> >>> import statements that need to take place for the first level of
>> shading
>> >>> but there are no logic changes that take place.
>> >>>
>> >>> @Stephen Yeah I can see BouncyCastle being more of an issue but it
>> does
>> >> not
>> >>> appear in any of the Sentry dependency chains right now.
>> >>>
>> >>> -=Brian
>> >>>
>> >>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com>
>> >> wrote:
>> >>>
>> >>>> +1
>> >>>>
>> >>>> Agree with Kalyan’s concern.  To me it seems simpler to shade all
>> jars
>> >>>> rather than known issue jars.  One thing though is what if there are
>> >> bug
>> >>>> fixes in our jars that are not included in other components, it may
>> >> cause
>> >>>> subtle error that are hard to track.  I’m thinking along the lines of
>> >> the
>> >>>> KeyValue class change in behavior.  So I propose limited the scope of
>> >>>> shaded jars to known issues.
>> >>>>
>> >>>> However one thing to note is that some jars do not like to be shaded,
>> >> as
>> >>>> an example BouncyCastle has a check to see if the jar has been
>> modified
>> >>> and
>> >>>> will refuse to be used in a shaded jar.
>> >>>>
>> >>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
>> >>>> kkalyan@cloudera.com> wrote:
>> >>>>>
>> >>>>> +1
>> >>>>>
>> >>>>> It's good approach but I have a question/concern.
>> >>>>> Is the proposal to shade is for some specific jars OR to shade all
>> >> the
>> >>>>> third party jars?
>> >>>>>
>> >>>>> If proposal is shade all the third-party jars then if would impact
>> >> the
>> >>>> run
>> >>>>> time memory usage as all the classes from the third-party jars would
>> >> be
>> >>>>> loaded regardless..
>> >>>>>
>> >>>>> -Kalyan
>> >>>>>
>> >>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
>> >>> sergio.pena@cloudera.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> +1
>> >>>>>>
>> >>>>>> I like the idea. It's hard to upgrade our libraries to newer ones
>> >> when
>> >>>>>> other components break due to Sentry being a plugin. I was once
>> >>> thought
>> >>>> of
>> >>>>>> using different jars versions per plugin (i.e. guava11 on hdfs,
>> >>> guava14
>> >>>> on
>> >>>>>> hive and the server, etc), but that is too much to do and not good.
>> >> I
>> >>>> like
>> >>>>>> the hbase idea better so we can use the latest libraries that we
>> >> want
>> >>>>>> instead of depending on other components.
>> >>>>>>
>> >>>>>> - Sergio
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
>> >>>> akolb@cloudera.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Agreed, this would be a very useful thing to do. I remember
>> >> spending
>> >>> a
>> >>>>>> lot
>> >>>>>>> of time trying to make Sentry work with DataNucleus 4 - the
>> problem
>> >>> was
>> >>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this
>> >>>> created
>> >>>>>> a
>> >>>>>>> conflict on the DataNucleus libraries and test failures.
>> >>>>>>>
>> >>>>>>> Looking at the HBase proposal it seems to require some (small)
>> code
>> >>>>>> changes
>> >>>>>>> for package names - or I am misreading it? Do you plan to shade
>> all
>> >>>> 3-rd
>> >>>>>>> party packages or just some? I would consider at least
>> >>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty.
>> >>>>>>>
>> >>>>>>> - Alex
>> >>>>>>>
>> >>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
>> >> btowles@cloudera.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Howdy all,
>> >>>>>>>>
>> >>>>>>>> An issue that keeps coming up seems to be the conflict of
>> >> dependency
>> >>>>>>>> versions between Sentry and the components it is plugging into. A
>> >>>>>> current
>> >>>>>>>> example of this impact is Google Guava with hive2 using v14 and
>> >>> Impala
>> >>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix
>> >>> some
>> >>>>>>> bugs
>> >>>>>>>> in the BoneCP library.  These sort of conflicts come in whenever
>> >> we
>> >>>> are
>> >>>>>>>> embedding a Sentry plugin or using a Sentry library into another
>> >>>>>> project.
>> >>>>>>>>
>> >>>>>>>> I would like to propose a mechanism for offsetting some of these
>> >>>> issues
>> >>>>>>>> using something similar to the third party shading that HBase
>> uses
>> >>> for
>> >>>>>>>> common problems (such as Guava)
>> >>>>>>>> (https://github.com/apache/hbase-thirdparty with
>> >>>>>>>> documentation about it
>> >> http://hbase.apache.org/book.html#thirdparty
>> >>> )
>> >>>>>> and
>> >>>>>>>> is
>> >>>>>>>> used by HBase and Hadoop for packaging of their downstream
>> >>> artifacts (
>> >>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
>> >>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
>> >>>>>>>> client-modules/hadoop-client-api/
>> >>>>>>>> )
>> >>>>>>>>
>> >>>>>>>> The main benefit of this is that it would allow Sentry to be used
>> >> as
>> >>>>>>>> libraries and plugins with all of the dependencies needed for
>> >> Sentry
>> >>>> to
>> >>>>>>> be
>> >>>>>>>> abstracted away from components implementing it. Sentry could rev
>> >>>>>>> versions
>> >>>>>>>> of libraries easier and not have collisions of library versions
>> >>> needed
>> >>>>>> by
>> >>>>>>>> the implementing component.  As well this would potentially make
>> >>>> Sentry
>> >>>>>>>> downstream usage more stable since it would be used and tested
>> >>> against
>> >>>>>> a
>> >>>>>>>> static set of dependencies and not using libraries based on what
>> >> the
>> >>>>>>>> implementing component has.
>> >>>>>>>>
>> >>>>>>>> On the downside, it would make the on disk size of the Sentry
>> >>> plugins
>> >>>>>> and
>> >>>>>>>> libraries for downstream larger. As well, the number of classes
>> >>> loaded
>> >>>>>>> into
>> >>>>>>>> memory would be larger since there would be potential duplication
>> >> of
>> >>>>>>> actual
>> >>>>>>>> class implementations or multiple versions of a class with
>> >> different
>> >>>>>>>> package names in memory.  But this seems to be a common practice
>> >> and
>> >>>>>> the
>> >>>>>>>> lack of library version collisions and stability make up for
>> these
>> >>>>>>>> downsides.
>> >>>>>>>>
>> >>>>>>>> The third party shading works by using the Maven shade plugin to
>> >> do
>> >>>>>>> package
>> >>>>>>>> name shifting of the third party library (Guava, BoneCP) to a
>> >> sentry
>> >>>>>>>> specific package using the version on the third party library
>> >> needed
>> >>>>>> for
>> >>>>>>>> Sentry.
>> >>>>>>>>
>> >>>>>>>> E.G. *com.google.common* packages could be shifted to
>> >>>>>>>> *org.apache.sentry.shaded.com.google.common*.
>> >>>>>>>>
>> >>>>>>>> Since the Maven Shade plugin can actually change the byte-codes
>> of
>> >>> the
>> >>>>>>>> libraries being shaded, we can do this even for dependencies that
>> >>> have
>> >>>>>>>> shared sub-dependencies .  BoneCP being the main example since it
>> >>> uses
>> >>>>>>>> Guava internal to itself.  We could shade the BoneCP into the
>> same
>> >>>>>> shared
>> >>>>>>>> sentry third party dependency jar and since the bytecode level
>> >>>>>>>> manipulation.
>> >>>>>>>>
>> >>>>>>>> What this means on the development side is that we would need to
>> >>>>>>> reference
>> >>>>>>>> imports from the shaded third party
>> >>>>>>>>
>> >>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import
>> >>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>> >>>>>>>>
>> >>>>>>>> Ive been looking at this in the context of
>> >>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
>> >> this
>> >>>>>>> should
>> >>>>>>>> potentially be something that is more of an overall Sentry
>> >> standard
>> >>>>>>>> practice and larger scale implementation.
>> >>>>>>>>
>> >>>>>>>> -=Brian
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Brian Towles* | Software Engineer
>> >>>>>>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
>> >>>>>>> jond@cloudera.com>
>> >>>>>>>> cloudera.com <http://www.cloudera.com/>
>> >>>>>>>>
>> >>>>>>>> [image: Cloudera] <http://www.cloudera.com/>
>> >>>>>>>>
>> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>> >> [image:
>> >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>> [image:
>> >>>>>>> Cloudera
>> >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> >>>>>>>> ------------------------------
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>> --
>> >>> *Brian Towles* | Software Engineer
>> >>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
>> >> jond@cloudera.com>
>> >>> cloudera.com <http://www.cloudera.com/>
>> >>>
>> >>> [image: Cloudera] <http://www.cloudera.com/>
>> >>>
>> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> >> Cloudera
>> >>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> >>> ------------------------------
>> >>>
>> >>
>> > --
>> > *Brian Towles* | Software Engineer
>> > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
>> jond@cloudera.com>
>> > cloudera.com <http://www.cloudera.com/>
>> >
>> > [image: Cloudera] <http://www.cloudera.com/>
>> >
>> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera
>> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> > ------------------------------
>>
>> --
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
> cloudera.com <http://www.cloudera.com/>
>
> [image: Cloudera] <http://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
-- 
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: Shading of conflicting dependencies

Posted by Brian Towles <bt...@cloudera.com>.
HBase does have a ensure-jars-have-correct-contents.sh check that they run
which we could do something similar to.

On Wed, Nov 15, 2017 at 3:30 PM Stephen Moist <mo...@cloudera.com> wrote:

> Is there any way to add a check to the maven build to make sure that the
> classes being imported are the shaded ones?
>
>
> > On Nov 15, 2017, at 3:02 PM, Brian Towles <bt...@cloudera.com> wrote:
> >
> > I can see that.  I can look into that as well.  It would make sense for
> the
> > "Sentry as a Library" you talked about as well
> >
> > On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <ak...@cloudera.com>
> > wrote:
> >
> >> Brian, There are cases where shading is useful for server as well -
> mostly
> >> for e2e tests which run server, HDFS, Hive all in the same JVM.
> >>
> >> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <bt...@cloudera.com>
> >> wrote:
> >>
> >>> So there would be two different levels of shading.
> >>>
> >>> The first would be the known issues shared repackaged dependencies like
> >>> Guava.  This would only be for libraries that we absolutely know will
> >> cause
> >>> collisions with implementing services and implementing third party
> >>> libraries that need to have the references to the trouble causing
> >> libraries
> >>> shifted.  These trouble causing libraries would be the only ones where
> >> hard
> >>> coded import path changes would need to take place for.
> >>>
> >>> For example a sentry-thridparty/sentry-shaded-miscellaneous module
> would
> >>> shared google guava shifting the package to
> >>> org.apache.sentry.thirdparty.com.google.common. As well as it would
> shade
> >>> the BoneCP jar so that it picks up the shift of the Guava package but
> not
> >>> shifting the BoneCP packages unless we encounter and a known conflict.
> >>>
> >>> The second level os shading would be for the plugins distribution and
> >> would
> >>> shade and package shift only the dependencies needed for that plugin.
> >> This
> >>> would not need to have any import statement changes as the maven shade
> >>> plugin would change the bytecode package references on packaging.  As
> to
> >>> Kalyan's question, the run time memory used for class loading would
> only
> >> be
> >>> for those items used by the plugin, which is really not much anyways
> >>> comparatively. The class loader doesn't load up a class until its
> needed.
> >>> Till that point it is just a reference to a file in a jar.  The list of
> >>> files/classes in jars is not a heavy memory load.
> >>>
> >>> The main server implementation would not need to have the second level
> >>> shading done to it as it runs stand alone and is unaffected by the
> >>> collisions issues.
> >>>
> >>> @Alexander For your questions, there is a small amount of refactoring
> of
> >>> import statements that need to take place for the first level of
> shading
> >>> but there are no logic changes that take place.
> >>>
> >>> @Stephen Yeah I can see BouncyCastle being more of an issue but it does
> >> not
> >>> appear in any of the Sentry dependency chains right now.
> >>>
> >>> -=Brian
> >>>
> >>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com>
> >> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> >>>> rather than known issue jars.  One thing though is what if there are
> >> bug
> >>>> fixes in our jars that are not included in other components, it may
> >> cause
> >>>> subtle error that are hard to track.  I’m thinking along the lines of
> >> the
> >>>> KeyValue class change in behavior.  So I propose limited the scope of
> >>>> shaded jars to known issues.
> >>>>
> >>>> However one thing to note is that some jars do not like to be shaded,
> >> as
> >>>> an example BouncyCastle has a check to see if the jar has been
> modified
> >>> and
> >>>> will refuse to be used in a shaded jar.
> >>>>
> >>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> >>>> kkalyan@cloudera.com> wrote:
> >>>>>
> >>>>> +1
> >>>>>
> >>>>> It's good approach but I have a question/concern.
> >>>>> Is the proposal to shade is for some specific jars OR to shade all
> >> the
> >>>>> third party jars?
> >>>>>
> >>>>> If proposal is shade all the third-party jars then if would impact
> >> the
> >>>> run
> >>>>> time memory usage as all the classes from the third-party jars would
> >> be
> >>>>> loaded regardless..
> >>>>>
> >>>>> -Kalyan
> >>>>>
> >>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
> >>> sergio.pena@cloudera.com>
> >>>>> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> I like the idea. It's hard to upgrade our libraries to newer ones
> >> when
> >>>>>> other components break due to Sentry being a plugin. I was once
> >>> thought
> >>>> of
> >>>>>> using different jars versions per plugin (i.e. guava11 on hdfs,
> >>> guava14
> >>>> on
> >>>>>> hive and the server, etc), but that is too much to do and not good.
> >> I
> >>>> like
> >>>>>> the hbase idea better so we can use the latest libraries that we
> >> want
> >>>>>> instead of depending on other components.
> >>>>>>
> >>>>>> - Sergio
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> >>>> akolb@cloudera.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Agreed, this would be a very useful thing to do. I remember
> >> spending
> >>> a
> >>>>>> lot
> >>>>>>> of time trying to make Sentry work with DataNucleus 4 - the problem
> >>> was
> >>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this
> >>>> created
> >>>>>> a
> >>>>>>> conflict on the DataNucleus libraries and test failures.
> >>>>>>>
> >>>>>>> Looking at the HBase proposal it seems to require some (small) code
> >>>>>> changes
> >>>>>>> for package names - or I am misreading it? Do you plan to shade all
> >>>> 3-rd
> >>>>>>> party packages or just some? I would consider at least
> >>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty.
> >>>>>>>
> >>>>>>> - Alex
> >>>>>>>
> >>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
> >> btowles@cloudera.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Howdy all,
> >>>>>>>>
> >>>>>>>> An issue that keeps coming up seems to be the conflict of
> >> dependency
> >>>>>>>> versions between Sentry and the components it is plugging into. A
> >>>>>> current
> >>>>>>>> example of this impact is Google Guava with hive2 using v14 and
> >>> Impala
> >>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix
> >>> some
> >>>>>>> bugs
> >>>>>>>> in the BoneCP library.  These sort of conflicts come in whenever
> >> we
> >>>> are
> >>>>>>>> embedding a Sentry plugin or using a Sentry library into another
> >>>>>> project.
> >>>>>>>>
> >>>>>>>> I would like to propose a mechanism for offsetting some of these
> >>>> issues
> >>>>>>>> using something similar to the third party shading that HBase uses
> >>> for
> >>>>>>>> common problems (such as Guava)
> >>>>>>>> (https://github.com/apache/hbase-thirdparty with
> >>>>>>>> documentation about it
> >> http://hbase.apache.org/book.html#thirdparty
> >>> )
> >>>>>> and
> >>>>>>>> is
> >>>>>>>> used by HBase and Hadoop for packaging of their downstream
> >>> artifacts (
> >>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> >>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> >>>>>>>> client-modules/hadoop-client-api/
> >>>>>>>> )
> >>>>>>>>
> >>>>>>>> The main benefit of this is that it would allow Sentry to be used
> >> as
> >>>>>>>> libraries and plugins with all of the dependencies needed for
> >> Sentry
> >>>> to
> >>>>>>> be
> >>>>>>>> abstracted away from components implementing it. Sentry could rev
> >>>>>>> versions
> >>>>>>>> of libraries easier and not have collisions of library versions
> >>> needed
> >>>>>> by
> >>>>>>>> the implementing component.  As well this would potentially make
> >>>> Sentry
> >>>>>>>> downstream usage more stable since it would be used and tested
> >>> against
> >>>>>> a
> >>>>>>>> static set of dependencies and not using libraries based on what
> >> the
> >>>>>>>> implementing component has.
> >>>>>>>>
> >>>>>>>> On the downside, it would make the on disk size of the Sentry
> >>> plugins
> >>>>>> and
> >>>>>>>> libraries for downstream larger. As well, the number of classes
> >>> loaded
> >>>>>>> into
> >>>>>>>> memory would be larger since there would be potential duplication
> >> of
> >>>>>>> actual
> >>>>>>>> class implementations or multiple versions of a class with
> >> different
> >>>>>>>> package names in memory.  But this seems to be a common practice
> >> and
> >>>>>> the
> >>>>>>>> lack of library version collisions and stability make up for these
> >>>>>>>> downsides.
> >>>>>>>>
> >>>>>>>> The third party shading works by using the Maven shade plugin to
> >> do
> >>>>>>> package
> >>>>>>>> name shifting of the third party library (Guava, BoneCP) to a
> >> sentry
> >>>>>>>> specific package using the version on the third party library
> >> needed
> >>>>>> for
> >>>>>>>> Sentry.
> >>>>>>>>
> >>>>>>>> E.G. *com.google.common* packages could be shifted to
> >>>>>>>> *org.apache.sentry.shaded.com.google.common*.
> >>>>>>>>
> >>>>>>>> Since the Maven Shade plugin can actually change the byte-codes of
> >>> the
> >>>>>>>> libraries being shaded, we can do this even for dependencies that
> >>> have
> >>>>>>>> shared sub-dependencies .  BoneCP being the main example since it
> >>> uses
> >>>>>>>> Guava internal to itself.  We could shade the BoneCP into the same
> >>>>>> shared
> >>>>>>>> sentry third party dependency jar and since the bytecode level
> >>>>>>>> manipulation.
> >>>>>>>>
> >>>>>>>> What this means on the development side is that we would need to
> >>>>>>> reference
> >>>>>>>> imports from the shaded third party
> >>>>>>>>
> >>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import
> >>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> >>>>>>>>
> >>>>>>>> Ive been looking at this in the context of
> >>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
> >> this
> >>>>>>> should
> >>>>>>>> potentially be something that is more of an overall Sentry
> >> standard
> >>>>>>>> practice and larger scale implementation.
> >>>>>>>>
> >>>>>>>> -=Brian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Brian Towles* | Software Engineer
> >>>>>>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> >>>>>>> jond@cloudera.com>
> >>>>>>>> cloudera.com <http://www.cloudera.com/>
> >>>>>>>>
> >>>>>>>> [image: Cloudera] <http://www.cloudera.com/>
> >>>>>>>>
> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> >> [image:
> >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >>>>>>> Cloudera
> >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>>>>>>> ------------------------------
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>> --
> >>> *Brian Towles* | Software Engineer
> >>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> >> jond@cloudera.com>
> >>> cloudera.com <http://www.cloudera.com/>
> >>>
> >>> [image: Cloudera] <http://www.cloudera.com/>
> >>>
> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >> Cloudera
> >>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>> ------------------------------
> >>>
> >>
> > --
> > *Brian Towles* | Software Engineer
> > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> jond@cloudera.com>
> > cloudera.com <http://www.cloudera.com/>
> >
> > [image: Cloudera] <http://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
>
> --
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: Shading of conflicting dependencies

Posted by Stephen Moist <mo...@cloudera.com>.
Is there any way to add a check to the maven build to make sure that the classes being imported are the shaded ones?


> On Nov 15, 2017, at 3:02 PM, Brian Towles <bt...@cloudera.com> wrote:
> 
> I can see that.  I can look into that as well.  It would make sense for the
> "Sentry as a Library" you talked about as well
> 
> On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <ak...@cloudera.com>
> wrote:
> 
>> Brian, There are cases where shading is useful for server as well - mostly
>> for e2e tests which run server, HDFS, Hive all in the same JVM.
>> 
>> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <bt...@cloudera.com>
>> wrote:
>> 
>>> So there would be two different levels of shading.
>>> 
>>> The first would be the known issues shared repackaged dependencies like
>>> Guava.  This would only be for libraries that we absolutely know will
>> cause
>>> collisions with implementing services and implementing third party
>>> libraries that need to have the references to the trouble causing
>> libraries
>>> shifted.  These trouble causing libraries would be the only ones where
>> hard
>>> coded import path changes would need to take place for.
>>> 
>>> For example a sentry-thridparty/sentry-shaded-miscellaneous module would
>>> shared google guava shifting the package to
>>> org.apache.sentry.thirdparty.com.google.common. As well as it would shade
>>> the BoneCP jar so that it picks up the shift of the Guava package but not
>>> shifting the BoneCP packages unless we encounter and a known conflict.
>>> 
>>> The second level os shading would be for the plugins distribution and
>> would
>>> shade and package shift only the dependencies needed for that plugin.
>> This
>>> would not need to have any import statement changes as the maven shade
>>> plugin would change the bytecode package references on packaging.  As to
>>> Kalyan's question, the run time memory used for class loading would only
>> be
>>> for those items used by the plugin, which is really not much anyways
>>> comparatively. The class loader doesn't load up a class until its needed.
>>> Till that point it is just a reference to a file in a jar.  The list of
>>> files/classes in jars is not a heavy memory load.
>>> 
>>> The main server implementation would not need to have the second level
>>> shading done to it as it runs stand alone and is unaffected by the
>>> collisions issues.
>>> 
>>> @Alexander For your questions, there is a small amount of refactoring of
>>> import statements that need to take place for the first level of shading
>>> but there are no logic changes that take place.
>>> 
>>> @Stephen Yeah I can see BouncyCastle being more of an issue but it does
>> not
>>> appear in any of the Sentry dependency chains right now.
>>> 
>>> -=Brian
>>> 
>>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com>
>> wrote:
>>> 
>>>> +1
>>>> 
>>>> Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
>>>> rather than known issue jars.  One thing though is what if there are
>> bug
>>>> fixes in our jars that are not included in other components, it may
>> cause
>>>> subtle error that are hard to track.  I’m thinking along the lines of
>> the
>>>> KeyValue class change in behavior.  So I propose limited the scope of
>>>> shaded jars to known issues.
>>>> 
>>>> However one thing to note is that some jars do not like to be shaded,
>> as
>>>> an example BouncyCastle has a check to see if the jar has been modified
>>> and
>>>> will refuse to be used in a shaded jar.
>>>> 
>>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
>>>> kkalyan@cloudera.com> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> It's good approach but I have a question/concern.
>>>>> Is the proposal to shade is for some specific jars OR to shade all
>> the
>>>>> third party jars?
>>>>> 
>>>>> If proposal is shade all the third-party jars then if would impact
>> the
>>>> run
>>>>> time memory usage as all the classes from the third-party jars would
>> be
>>>>> loaded regardless..
>>>>> 
>>>>> -Kalyan
>>>>> 
>>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
>>> sergio.pena@cloudera.com>
>>>>> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> I like the idea. It's hard to upgrade our libraries to newer ones
>> when
>>>>>> other components break due to Sentry being a plugin. I was once
>>> thought
>>>> of
>>>>>> using different jars versions per plugin (i.e. guava11 on hdfs,
>>> guava14
>>>> on
>>>>>> hive and the server, etc), but that is too much to do and not good.
>> I
>>>> like
>>>>>> the hbase idea better so we can use the latest libraries that we
>> want
>>>>>> instead of depending on other components.
>>>>>> 
>>>>>> - Sergio
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
>>>> akolb@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Agreed, this would be a very useful thing to do. I remember
>> spending
>>> a
>>>>>> lot
>>>>>>> of time trying to make Sentry work with DataNucleus 4 - the problem
>>> was
>>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this
>>>> created
>>>>>> a
>>>>>>> conflict on the DataNucleus libraries and test failures.
>>>>>>> 
>>>>>>> Looking at the HBase proposal it seems to require some (small) code
>>>>>> changes
>>>>>>> for package names - or I am misreading it? Do you plan to shade all
>>>> 3-rd
>>>>>>> party packages or just some? I would consider at least
>>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty.
>>>>>>> 
>>>>>>> - Alex
>>>>>>> 
>>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
>> btowles@cloudera.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Howdy all,
>>>>>>>> 
>>>>>>>> An issue that keeps coming up seems to be the conflict of
>> dependency
>>>>>>>> versions between Sentry and the components it is plugging into. A
>>>>>> current
>>>>>>>> example of this impact is Google Guava with hive2 using v14 and
>>> Impala
>>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix
>>> some
>>>>>>> bugs
>>>>>>>> in the BoneCP library.  These sort of conflicts come in whenever
>> we
>>>> are
>>>>>>>> embedding a Sentry plugin or using a Sentry library into another
>>>>>> project.
>>>>>>>> 
>>>>>>>> I would like to propose a mechanism for offsetting some of these
>>>> issues
>>>>>>>> using something similar to the third party shading that HBase uses
>>> for
>>>>>>>> common problems (such as Guava)
>>>>>>>> (https://github.com/apache/hbase-thirdparty with
>>>>>>>> documentation about it
>> http://hbase.apache.org/book.html#thirdparty
>>> )
>>>>>> and
>>>>>>>> is
>>>>>>>> used by HBase and Hadoop for packaging of their downstream
>>> artifacts (
>>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
>>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
>>>>>>>> client-modules/hadoop-client-api/
>>>>>>>> )
>>>>>>>> 
>>>>>>>> The main benefit of this is that it would allow Sentry to be used
>> as
>>>>>>>> libraries and plugins with all of the dependencies needed for
>> Sentry
>>>> to
>>>>>>> be
>>>>>>>> abstracted away from components implementing it. Sentry could rev
>>>>>>> versions
>>>>>>>> of libraries easier and not have collisions of library versions
>>> needed
>>>>>> by
>>>>>>>> the implementing component.  As well this would potentially make
>>>> Sentry
>>>>>>>> downstream usage more stable since it would be used and tested
>>> against
>>>>>> a
>>>>>>>> static set of dependencies and not using libraries based on what
>> the
>>>>>>>> implementing component has.
>>>>>>>> 
>>>>>>>> On the downside, it would make the on disk size of the Sentry
>>> plugins
>>>>>> and
>>>>>>>> libraries for downstream larger. As well, the number of classes
>>> loaded
>>>>>>> into
>>>>>>>> memory would be larger since there would be potential duplication
>> of
>>>>>>> actual
>>>>>>>> class implementations or multiple versions of a class with
>> different
>>>>>>>> package names in memory.  But this seems to be a common practice
>> and
>>>>>> the
>>>>>>>> lack of library version collisions and stability make up for these
>>>>>>>> downsides.
>>>>>>>> 
>>>>>>>> The third party shading works by using the Maven shade plugin to
>> do
>>>>>>> package
>>>>>>>> name shifting of the third party library (Guava, BoneCP) to a
>> sentry
>>>>>>>> specific package using the version on the third party library
>> needed
>>>>>> for
>>>>>>>> Sentry.
>>>>>>>> 
>>>>>>>> E.G. *com.google.common* packages could be shifted to
>>>>>>>> *org.apache.sentry.shaded.com.google.common*.
>>>>>>>> 
>>>>>>>> Since the Maven Shade plugin can actually change the byte-codes of
>>> the
>>>>>>>> libraries being shaded, we can do this even for dependencies that
>>> have
>>>>>>>> shared sub-dependencies .  BoneCP being the main example since it
>>> uses
>>>>>>>> Guava internal to itself.  We could shade the BoneCP into the same
>>>>>> shared
>>>>>>>> sentry third party dependency jar and since the bytecode level
>>>>>>>> manipulation.
>>>>>>>> 
>>>>>>>> What this means on the development side is that we would need to
>>>>>>> reference
>>>>>>>> imports from the shaded third party
>>>>>>>> 
>>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import
>>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>>>>>>>> 
>>>>>>>> Ive been looking at this in the context of
>>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
>> this
>>>>>>> should
>>>>>>>> potentially be something that is more of an overall Sentry
>> standard
>>>>>>>> practice and larger scale implementation.
>>>>>>>> 
>>>>>>>> -=Brian
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Brian Towles* | Software Engineer
>>>>>>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
>>>>>>> jond@cloudera.com>
>>>>>>>> cloudera.com <http://www.cloudera.com/>
>>>>>>>> 
>>>>>>>> [image: Cloudera] <http://www.cloudera.com/>
>>>>>>>> 
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera
>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> --
>>> *Brian Towles* | Software Engineer
>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
>> jond@cloudera.com>
>>> cloudera.com <http://www.cloudera.com/>
>>> 
>>> [image: Cloudera] <http://www.cloudera.com/>
>>> 
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera
>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>> 
>> 
> -- 
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
> cloudera.com <http://www.cloudera.com/>
> 
> [image: Cloudera] <http://www.cloudera.com/>
> 
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------


Re: Shading of conflicting dependencies

Posted by Brian Towles <bt...@cloudera.com>.
I can see that.  I can look into that as well.  It would make sense for the
"Sentry as a Library" you talked about as well

On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <ak...@cloudera.com>
wrote:

> Brian, There are cases where shading is useful for server as well - mostly
> for e2e tests which run server, HDFS, Hive all in the same JVM.
>
> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <bt...@cloudera.com>
> wrote:
>
> > So there would be two different levels of shading.
> >
> > The first would be the known issues shared repackaged dependencies like
> > Guava.  This would only be for libraries that we absolutely know will
> cause
> > collisions with implementing services and implementing third party
> > libraries that need to have the references to the trouble causing
> libraries
> > shifted.  These trouble causing libraries would be the only ones where
> hard
> > coded import path changes would need to take place for.
> >
> > For example a sentry-thridparty/sentry-shaded-miscellaneous module would
> > shared google guava shifting the package to
> > org.apache.sentry.thirdparty.com.google.common. As well as it would shade
> > the BoneCP jar so that it picks up the shift of the Guava package but not
> > shifting the BoneCP packages unless we encounter and a known conflict.
> >
> > The second level os shading would be for the plugins distribution and
> would
> > shade and package shift only the dependencies needed for that plugin.
> This
> > would not need to have any import statement changes as the maven shade
> > plugin would change the bytecode package references on packaging.  As to
> > Kalyan's question, the run time memory used for class loading would only
> be
> > for those items used by the plugin, which is really not much anyways
> > comparatively. The class loader doesn't load up a class until its needed.
> > Till that point it is just a reference to a file in a jar.  The list of
> > files/classes in jars is not a heavy memory load.
> >
> > The main server implementation would not need to have the second level
> > shading done to it as it runs stand alone and is unaffected by the
> > collisions issues.
> >
> > @Alexander For your questions, there is a small amount of refactoring of
> > import statements that need to take place for the first level of shading
> > but there are no logic changes that take place.
> >
> > @Stephen Yeah I can see BouncyCastle being more of an issue but it does
> not
> > appear in any of the Sentry dependency chains right now.
> >
> > -=Brian
> >
> > On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com>
> wrote:
> >
> > > +1
> > >
> > > Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> > > rather than known issue jars.  One thing though is what if there are
> bug
> > > fixes in our jars that are not included in other components, it may
> cause
> > > subtle error that are hard to track.  I’m thinking along the lines of
> the
> > > KeyValue class change in behavior.  So I propose limited the scope of
> > > shaded jars to known issues.
> > >
> > > However one thing to note is that some jars do not like to be shaded,
> as
> > > an example BouncyCastle has a check to see if the jar has been modified
> > and
> > > will refuse to be used in a shaded jar.
> > >
> > > > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> > > kkalyan@cloudera.com> wrote:
> > > >
> > > > +1
> > > >
> > > > It's good approach but I have a question/concern.
> > > > Is the proposal to shade is for some specific jars OR to shade all
> the
> > > > third party jars?
> > > >
> > > > If proposal is shade all the third-party jars then if would impact
> the
> > > run
> > > > time memory usage as all the classes from the third-party jars would
> be
> > > > loaded regardless..
> > > >
> > > > -Kalyan
> > > >
> > > > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
> > sergio.pena@cloudera.com>
> > > > wrote:
> > > >
> > > >> +1
> > > >>
> > > >> I like the idea. It's hard to upgrade our libraries to newer ones
> when
> > > >> other components break due to Sentry being a plugin. I was once
> > thought
> > > of
> > > >> using different jars versions per plugin (i.e. guava11 on hdfs,
> > guava14
> > > on
> > > >> hive and the server, etc), but that is too much to do and not good.
> I
> > > like
> > > >> the hbase idea better so we can use the latest libraries that we
> want
> > > >> instead of depending on other components.
> > > >>
> > > >> - Sergio
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> > > akolb@cloudera.com>
> > > >> wrote:
> > > >>
> > > >>> Agreed, this would be a very useful thing to do. I remember
> spending
> > a
> > > >> lot
> > > >>> of time trying to make Sentry work with DataNucleus 4 - the problem
> > was
> > > >>> that e2e tests combine Sentry with Hive in the same JVM and this
> > > created
> > > >> a
> > > >>> conflict on the DataNucleus libraries and test failures.
> > > >>>
> > > >>> Looking at the HBase proposal it seems to require some (small) code
> > > >> changes
> > > >>> for package names - or I am misreading it? Do you plan to shade all
> > > 3-rd
> > > >>> party packages or just some? I would consider at least
> > > >>> Guava/ZooKeepeer/DataNucleus/Jetty.
> > > >>>
> > > >>> - Alex
> > > >>>
> > > >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
> btowles@cloudera.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Howdy all,
> > > >>>>
> > > >>>> An issue that keeps coming up seems to be the conflict of
> dependency
> > > >>>> versions between Sentry and the components it is plugging into. A
> > > >> current
> > > >>>> example of this impact is Google Guava with hive2 using v14 and
> > Impala
> > > >>>> using v11 while Sentry needs to have at least v14 in order to fix
> > some
> > > >>> bugs
> > > >>>> in the BoneCP library.  These sort of conflicts come in whenever
> we
> > > are
> > > >>>> embedding a Sentry plugin or using a Sentry library into another
> > > >> project.
> > > >>>>
> > > >>>> I would like to propose a mechanism for offsetting some of these
> > > issues
> > > >>>> using something similar to the third party shading that HBase uses
> > for
> > > >>>> common problems (such as Guava)
> > > >>>> (https://github.com/apache/hbase-thirdparty with
> > > >>>> documentation about it
> http://hbase.apache.org/book.html#thirdparty
> > )
> > > >> and
> > > >>>> is
> > > >>>> used by HBase and Hadoop for packaging of their downstream
> > artifacts (
> > > >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > > >>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> > > >>>> client-modules/hadoop-client-api/
> > > >>>> )
> > > >>>>
> > > >>>> The main benefit of this is that it would allow Sentry to be used
> as
> > > >>>> libraries and plugins with all of the dependencies needed for
> Sentry
> > > to
> > > >>> be
> > > >>>> abstracted away from components implementing it. Sentry could rev
> > > >>> versions
> > > >>>> of libraries easier and not have collisions of library versions
> > needed
> > > >> by
> > > >>>> the implementing component.  As well this would potentially make
> > > Sentry
> > > >>>> downstream usage more stable since it would be used and tested
> > against
> > > >> a
> > > >>>> static set of dependencies and not using libraries based on what
> the
> > > >>>> implementing component has.
> > > >>>>
> > > >>>> On the downside, it would make the on disk size of the Sentry
> > plugins
> > > >> and
> > > >>>> libraries for downstream larger. As well, the number of classes
> > loaded
> > > >>> into
> > > >>>> memory would be larger since there would be potential duplication
> of
> > > >>> actual
> > > >>>> class implementations or multiple versions of a class with
> different
> > > >>>> package names in memory.  But this seems to be a common practice
> and
> > > >> the
> > > >>>> lack of library version collisions and stability make up for these
> > > >>>> downsides.
> > > >>>>
> > > >>>> The third party shading works by using the Maven shade plugin to
> do
> > > >>> package
> > > >>>> name shifting of the third party library (Guava, BoneCP) to a
> sentry
> > > >>>> specific package using the version on the third party library
> needed
> > > >> for
> > > >>>> Sentry.
> > > >>>>
> > > >>>> E.G. *com.google.common* packages could be shifted to
> > > >>>> *org.apache.sentry.shaded.com.google.common*.
> > > >>>>
> > > >>>> Since the Maven Shade plugin can actually change the byte-codes of
> > the
> > > >>>> libraries being shaded, we can do this even for dependencies that
> > have
> > > >>>> shared sub-dependencies .  BoneCP being the main example since it
> > uses
> > > >>>> Guava internal to itself.  We could shade the BoneCP into the same
> > > >> shared
> > > >>>> sentry third party dependency jar and since the bytecode level
> > > >>>> manipulation.
> > > >>>>
> > > >>>> What this means on the development side is that we would need to
> > > >>> reference
> > > >>>> imports from the shaded third party
> > > >>>>
> > > >>>> E.G. *import com.google.common.collect.Maps* becomes *import
> > > >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> > > >>>>
> > > >>>> Ive been looking at this in the context of
> > > >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
> this
> > > >>> should
> > > >>>> potentially be something that is more of an overall Sentry
> standard
> > > >>>> practice and larger scale implementation.
> > > >>>>
> > > >>>> -=Brian
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> *Brian Towles* | Software Engineer
> > > >>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> > > >>> jond@cloudera.com>
> > > >>>> cloudera.com <http://www.cloudera.com/>
> > > >>>>
> > > >>>> [image: Cloudera] <http://www.cloudera.com/>
> > > >>>>
> > > >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > >>> Cloudera
> > > >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > >>>> ------------------------------
> > > >>>>
> > > >>>
> > > >>
> > >
> > > --
> > *Brian Towles* | Software Engineer
> > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> jond@cloudera.com>
> > cloudera.com <http://www.cloudera.com/>
> >
> > [image: Cloudera] <http://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> >
>
-- 
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: Shading of conflicting dependencies

Posted by Alexander Kolbasov <ak...@cloudera.com>.
Brian, There are cases where shading is useful for server as well - mostly
for e2e tests which run server, HDFS, Hive all in the same JVM.

On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <bt...@cloudera.com> wrote:

> So there would be two different levels of shading.
>
> The first would be the known issues shared repackaged dependencies like
> Guava.  This would only be for libraries that we absolutely know will cause
> collisions with implementing services and implementing third party
> libraries that need to have the references to the trouble causing libraries
> shifted.  These trouble causing libraries would be the only ones where hard
> coded import path changes would need to take place for.
>
> For example a sentry-thridparty/sentry-shaded-miscellaneous module would
> shared google guava shifting the package to
> org.apache.sentry.thirdparty.com.google.common. As well as it would shade
> the BoneCP jar so that it picks up the shift of the Guava package but not
> shifting the BoneCP packages unless we encounter and a known conflict.
>
> The second level os shading would be for the plugins distribution and would
> shade and package shift only the dependencies needed for that plugin.  This
> would not need to have any import statement changes as the maven shade
> plugin would change the bytecode package references on packaging.  As to
> Kalyan's question, the run time memory used for class loading would only be
> for those items used by the plugin, which is really not much anyways
> comparatively. The class loader doesn't load up a class until its needed.
> Till that point it is just a reference to a file in a jar.  The list of
> files/classes in jars is not a heavy memory load.
>
> The main server implementation would not need to have the second level
> shading done to it as it runs stand alone and is unaffected by the
> collisions issues.
>
> @Alexander For your questions, there is a small amount of refactoring of
> import statements that need to take place for the first level of shading
> but there are no logic changes that take place.
>
> @Stephen Yeah I can see BouncyCastle being more of an issue but it does not
> appear in any of the Sentry dependency chains right now.
>
> -=Brian
>
> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com> wrote:
>
> > +1
> >
> > Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> > rather than known issue jars.  One thing though is what if there are bug
> > fixes in our jars that are not included in other components, it may cause
> > subtle error that are hard to track.  I’m thinking along the lines of the
> > KeyValue class change in behavior.  So I propose limited the scope of
> > shaded jars to known issues.
> >
> > However one thing to note is that some jars do not like to be shaded, as
> > an example BouncyCastle has a check to see if the jar has been modified
> and
> > will refuse to be used in a shaded jar.
> >
> > > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> > kkalyan@cloudera.com> wrote:
> > >
> > > +1
> > >
> > > It's good approach but I have a question/concern.
> > > Is the proposal to shade is for some specific jars OR to shade all the
> > > third party jars?
> > >
> > > If proposal is shade all the third-party jars then if would impact the
> > run
> > > time memory usage as all the classes from the third-party jars would be
> > > loaded regardless..
> > >
> > > -Kalyan
> > >
> > > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
> sergio.pena@cloudera.com>
> > > wrote:
> > >
> > >> +1
> > >>
> > >> I like the idea. It's hard to upgrade our libraries to newer ones when
> > >> other components break due to Sentry being a plugin. I was once
> thought
> > of
> > >> using different jars versions per plugin (i.e. guava11 on hdfs,
> guava14
> > on
> > >> hive and the server, etc), but that is too much to do and not good. I
> > like
> > >> the hbase idea better so we can use the latest libraries that we want
> > >> instead of depending on other components.
> > >>
> > >> - Sergio
> > >>
> > >>
> > >>
> > >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> > akolb@cloudera.com>
> > >> wrote:
> > >>
> > >>> Agreed, this would be a very useful thing to do. I remember spending
> a
> > >> lot
> > >>> of time trying to make Sentry work with DataNucleus 4 - the problem
> was
> > >>> that e2e tests combine Sentry with Hive in the same JVM and this
> > created
> > >> a
> > >>> conflict on the DataNucleus libraries and test failures.
> > >>>
> > >>> Looking at the HBase proposal it seems to require some (small) code
> > >> changes
> > >>> for package names - or I am misreading it? Do you plan to shade all
> > 3-rd
> > >>> party packages or just some? I would consider at least
> > >>> Guava/ZooKeepeer/DataNucleus/Jetty.
> > >>>
> > >>> - Alex
> > >>>
> > >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com>
> > >>> wrote:
> > >>>
> > >>>> Howdy all,
> > >>>>
> > >>>> An issue that keeps coming up seems to be the conflict of dependency
> > >>>> versions between Sentry and the components it is plugging into. A
> > >> current
> > >>>> example of this impact is Google Guava with hive2 using v14 and
> Impala
> > >>>> using v11 while Sentry needs to have at least v14 in order to fix
> some
> > >>> bugs
> > >>>> in the BoneCP library.  These sort of conflicts come in whenever we
> > are
> > >>>> embedding a Sentry plugin or using a Sentry library into another
> > >> project.
> > >>>>
> > >>>> I would like to propose a mechanism for offsetting some of these
> > issues
> > >>>> using something similar to the third party shading that HBase uses
> for
> > >>>> common problems (such as Guava)
> > >>>> (https://github.com/apache/hbase-thirdparty with
> > >>>> documentation about it http://hbase.apache.org/book.html#thirdparty
> )
> > >> and
> > >>>> is
> > >>>> used by HBase and Hadoop for packaging of their downstream
> artifacts (
> > >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > >>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> > >>>> client-modules/hadoop-client-api/
> > >>>> )
> > >>>>
> > >>>> The main benefit of this is that it would allow Sentry to be used as
> > >>>> libraries and plugins with all of the dependencies needed for Sentry
> > to
> > >>> be
> > >>>> abstracted away from components implementing it. Sentry could rev
> > >>> versions
> > >>>> of libraries easier and not have collisions of library versions
> needed
> > >> by
> > >>>> the implementing component.  As well this would potentially make
> > Sentry
> > >>>> downstream usage more stable since it would be used and tested
> against
> > >> a
> > >>>> static set of dependencies and not using libraries based on what the
> > >>>> implementing component has.
> > >>>>
> > >>>> On the downside, it would make the on disk size of the Sentry
> plugins
> > >> and
> > >>>> libraries for downstream larger. As well, the number of classes
> loaded
> > >>> into
> > >>>> memory would be larger since there would be potential duplication of
> > >>> actual
> > >>>> class implementations or multiple versions of a class with different
> > >>>> package names in memory.  But this seems to be a common practice and
> > >> the
> > >>>> lack of library version collisions and stability make up for these
> > >>>> downsides.
> > >>>>
> > >>>> The third party shading works by using the Maven shade plugin to do
> > >>> package
> > >>>> name shifting of the third party library (Guava, BoneCP) to a sentry
> > >>>> specific package using the version on the third party library needed
> > >> for
> > >>>> Sentry.
> > >>>>
> > >>>> E.G. *com.google.common* packages could be shifted to
> > >>>> *org.apache.sentry.shaded.com.google.common*.
> > >>>>
> > >>>> Since the Maven Shade plugin can actually change the byte-codes of
> the
> > >>>> libraries being shaded, we can do this even for dependencies that
> have
> > >>>> shared sub-dependencies .  BoneCP being the main example since it
> uses
> > >>>> Guava internal to itself.  We could shade the BoneCP into the same
> > >> shared
> > >>>> sentry third party dependency jar and since the bytecode level
> > >>>> manipulation.
> > >>>>
> > >>>> What this means on the development side is that we would need to
> > >>> reference
> > >>>> imports from the shaded third party
> > >>>>
> > >>>> E.G. *import com.google.common.collect.Maps* becomes *import
> > >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> > >>>>
> > >>>> Ive been looking at this in the context of
> > >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
> > >>> should
> > >>>> potentially be something that is more of an overall Sentry standard
> > >>>> practice and larger scale implementation.
> > >>>>
> > >>>> -=Brian
> > >>>>
> > >>>>
> > >>>> --
> > >>>> *Brian Towles* | Software Engineer
> > >>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> > >>> jond@cloudera.com>
> > >>>> cloudera.com <http://www.cloudera.com/>
> > >>>>
> > >>>> [image: Cloudera] <http://www.cloudera.com/>
> > >>>>
> > >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > >>> Cloudera
> > >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > >>>> ------------------------------
> > >>>>
> > >>>
> > >>
> >
> > --
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
> cloudera.com <http://www.cloudera.com/>
>
> [image: Cloudera] <http://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>

Re: Shading of conflicting dependencies

Posted by Brian Towles <bt...@cloudera.com>.
So there would be two different levels of shading.

The first would be the known issues shared repackaged dependencies like
Guava.  This would only be for libraries that we absolutely know will cause
collisions with implementing services and implementing third party
libraries that need to have the references to the trouble causing libraries
shifted.  These trouble causing libraries would be the only ones where hard
coded import path changes would need to take place for.

For example a sentry-thridparty/sentry-shaded-miscellaneous module would
shared google guava shifting the package to
org.apache.sentry.thirdparty.com.google.common. As well as it would shade
the BoneCP jar so that it picks up the shift of the Guava package but not
shifting the BoneCP packages unless we encounter and a known conflict.

The second level os shading would be for the plugins distribution and would
shade and package shift only the dependencies needed for that plugin.  This
would not need to have any import statement changes as the maven shade
plugin would change the bytecode package references on packaging.  As to
Kalyan's question, the run time memory used for class loading would only be
for those items used by the plugin, which is really not much anyways
comparatively. The class loader doesn't load up a class until its needed.
Till that point it is just a reference to a file in a jar.  The list of
files/classes in jars is not a heavy memory load.

The main server implementation would not need to have the second level
shading done to it as it runs stand alone and is unaffected by the
collisions issues.

@Alexander For your questions, there is a small amount of refactoring of
import statements that need to take place for the first level of shading
but there are no logic changes that take place.

@Stephen Yeah I can see BouncyCastle being more of an issue but it does not
appear in any of the Sentry dependency chains right now.

-=Brian

On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com> wrote:

> +1
>
> Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> rather than known issue jars.  One thing though is what if there are bug
> fixes in our jars that are not included in other components, it may cause
> subtle error that are hard to track.  I’m thinking along the lines of the
> KeyValue class change in behavior.  So I propose limited the scope of
> shaded jars to known issues.
>
> However one thing to note is that some jars do not like to be shaded, as
> an example BouncyCastle has a check to see if the jar has been modified and
> will refuse to be used in a shaded jar.
>
> > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> kkalyan@cloudera.com> wrote:
> >
> > +1
> >
> > It's good approach but I have a question/concern.
> > Is the proposal to shade is for some specific jars OR to shade all the
> > third party jars?
> >
> > If proposal is shade all the third-party jars then if would impact the
> run
> > time memory usage as all the classes from the third-party jars would be
> > loaded regardless..
> >
> > -Kalyan
> >
> > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <se...@cloudera.com>
> > wrote:
> >
> >> +1
> >>
> >> I like the idea. It's hard to upgrade our libraries to newer ones when
> >> other components break due to Sentry being a plugin. I was once thought
> of
> >> using different jars versions per plugin (i.e. guava11 on hdfs, guava14
> on
> >> hive and the server, etc), but that is too much to do and not good. I
> like
> >> the hbase idea better so we can use the latest libraries that we want
> >> instead of depending on other components.
> >>
> >> - Sergio
> >>
> >>
> >>
> >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> akolb@cloudera.com>
> >> wrote:
> >>
> >>> Agreed, this would be a very useful thing to do. I remember spending a
> >> lot
> >>> of time trying to make Sentry work with DataNucleus 4 - the problem was
> >>> that e2e tests combine Sentry with Hive in the same JVM and this
> created
> >> a
> >>> conflict on the DataNucleus libraries and test failures.
> >>>
> >>> Looking at the HBase proposal it seems to require some (small) code
> >> changes
> >>> for package names - or I am misreading it? Do you plan to shade all
> 3-rd
> >>> party packages or just some? I would consider at least
> >>> Guava/ZooKeepeer/DataNucleus/Jetty.
> >>>
> >>> - Alex
> >>>
> >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com>
> >>> wrote:
> >>>
> >>>> Howdy all,
> >>>>
> >>>> An issue that keeps coming up seems to be the conflict of dependency
> >>>> versions between Sentry and the components it is plugging into. A
> >> current
> >>>> example of this impact is Google Guava with hive2 using v14 and Impala
> >>>> using v11 while Sentry needs to have at least v14 in order to fix some
> >>> bugs
> >>>> in the BoneCP library.  These sort of conflicts come in whenever we
> are
> >>>> embedding a Sentry plugin or using a Sentry library into another
> >> project.
> >>>>
> >>>> I would like to propose a mechanism for offsetting some of these
> issues
> >>>> using something similar to the third party shading that HBase uses for
> >>>> common problems (such as Guava)
> >>>> (https://github.com/apache/hbase-thirdparty with
> >>>> documentation about it http://hbase.apache.org/book.html#thirdparty)
> >> and
> >>>> is
> >>>> used by HBase and Hadoop for packaging of their downstream artifacts (
> >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> >>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> >>>> client-modules/hadoop-client-api/
> >>>> )
> >>>>
> >>>> The main benefit of this is that it would allow Sentry to be used as
> >>>> libraries and plugins with all of the dependencies needed for Sentry
> to
> >>> be
> >>>> abstracted away from components implementing it. Sentry could rev
> >>> versions
> >>>> of libraries easier and not have collisions of library versions needed
> >> by
> >>>> the implementing component.  As well this would potentially make
> Sentry
> >>>> downstream usage more stable since it would be used and tested against
> >> a
> >>>> static set of dependencies and not using libraries based on what the
> >>>> implementing component has.
> >>>>
> >>>> On the downside, it would make the on disk size of the Sentry plugins
> >> and
> >>>> libraries for downstream larger. As well, the number of classes loaded
> >>> into
> >>>> memory would be larger since there would be potential duplication of
> >>> actual
> >>>> class implementations or multiple versions of a class with different
> >>>> package names in memory.  But this seems to be a common practice and
> >> the
> >>>> lack of library version collisions and stability make up for these
> >>>> downsides.
> >>>>
> >>>> The third party shading works by using the Maven shade plugin to do
> >>> package
> >>>> name shifting of the third party library (Guava, BoneCP) to a sentry
> >>>> specific package using the version on the third party library needed
> >> for
> >>>> Sentry.
> >>>>
> >>>> E.G. *com.google.common* packages could be shifted to
> >>>> *org.apache.sentry.shaded.com.google.common*.
> >>>>
> >>>> Since the Maven Shade plugin can actually change the byte-codes of the
> >>>> libraries being shaded, we can do this even for dependencies that have
> >>>> shared sub-dependencies .  BoneCP being the main example since it uses
> >>>> Guava internal to itself.  We could shade the BoneCP into the same
> >> shared
> >>>> sentry third party dependency jar and since the bytecode level
> >>>> manipulation.
> >>>>
> >>>> What this means on the development side is that we would need to
> >>> reference
> >>>> imports from the shaded third party
> >>>>
> >>>> E.G. *import com.google.common.collect.Maps* becomes *import
> >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> >>>>
> >>>> Ive been looking at this in the context of
> >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
> >>> should
> >>>> potentially be something that is more of an overall Sentry standard
> >>>> practice and larger scale implementation.
> >>>>
> >>>> -=Brian
> >>>>
> >>>>
> >>>> --
> >>>> *Brian Towles* | Software Engineer
> >>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> >>> jond@cloudera.com>
> >>>> cloudera.com <http://www.cloudera.com/>
> >>>>
> >>>> [image: Cloudera] <http://www.cloudera.com/>
> >>>>
> >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >>> Cloudera
> >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>>> ------------------------------
> >>>>
> >>>
> >>
>
> --
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: Shading of conflicting dependencies

Posted by Stephen Moist <mo...@cloudera.com>.
+1

Agree with Kalyan’s concern.  To me it seems simpler to shade all jars rather than known issue jars.  One thing though is what if there are bug fixes in our jars that are not included in other components, it may cause subtle error that are hard to track.  I’m thinking along the lines of the KeyValue class change in behavior.  So I propose limited the scope of shaded jars to known issues.

However one thing to note is that some jars do not like to be shaded, as an example BouncyCastle has a check to see if the jar has been modified and will refuse to be used in a shaded jar.

> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
> 
> +1
> 
> It's good approach but I have a question/concern.
> Is the proposal to shade is for some specific jars OR to shade all the
> third party jars?
> 
> If proposal is shade all the third-party jars then if would impact the run
> time memory usage as all the classes from the third-party jars would be
> loaded regardless..
> 
> -Kalyan
> 
> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <se...@cloudera.com>
> wrote:
> 
>> +1
>> 
>> I like the idea. It's hard to upgrade our libraries to newer ones when
>> other components break due to Sentry being a plugin. I was once thought of
>> using different jars versions per plugin (i.e. guava11 on hdfs, guava14 on
>> hive and the server, etc), but that is too much to do and not good. I like
>> the hbase idea better so we can use the latest libraries that we want
>> instead of depending on other components.
>> 
>> - Sergio
>> 
>> 
>> 
>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <ak...@cloudera.com>
>> wrote:
>> 
>>> Agreed, this would be a very useful thing to do. I remember spending a
>> lot
>>> of time trying to make Sentry work with DataNucleus 4 - the problem was
>>> that e2e tests combine Sentry with Hive in the same JVM and this created
>> a
>>> conflict on the DataNucleus libraries and test failures.
>>> 
>>> Looking at the HBase proposal it seems to require some (small) code
>> changes
>>> for package names - or I am misreading it? Do you plan to shade all 3-rd
>>> party packages or just some? I would consider at least
>>> Guava/ZooKeepeer/DataNucleus/Jetty.
>>> 
>>> - Alex
>>> 
>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com>
>>> wrote:
>>> 
>>>> Howdy all,
>>>> 
>>>> An issue that keeps coming up seems to be the conflict of dependency
>>>> versions between Sentry and the components it is plugging into. A
>> current
>>>> example of this impact is Google Guava with hive2 using v14 and Impala
>>>> using v11 while Sentry needs to have at least v14 in order to fix some
>>> bugs
>>>> in the BoneCP library.  These sort of conflicts come in whenever we are
>>>> embedding a Sentry plugin or using a Sentry library into another
>> project.
>>>> 
>>>> I would like to propose a mechanism for offsetting some of these issues
>>>> using something similar to the third party shading that HBase uses for
>>>> common problems (such as Guava)
>>>> (https://github.com/apache/hbase-thirdparty with
>>>> documentation about it http://hbase.apache.org/book.html#thirdparty)
>> and
>>>> is
>>>> used by HBase and Hadoop for packaging of their downstream artifacts (
>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
>>>> client-modules/hadoop-client-api/
>>>> )
>>>> 
>>>> The main benefit of this is that it would allow Sentry to be used as
>>>> libraries and plugins with all of the dependencies needed for Sentry to
>>> be
>>>> abstracted away from components implementing it. Sentry could rev
>>> versions
>>>> of libraries easier and not have collisions of library versions needed
>> by
>>>> the implementing component.  As well this would potentially make Sentry
>>>> downstream usage more stable since it would be used and tested against
>> a
>>>> static set of dependencies and not using libraries based on what the
>>>> implementing component has.
>>>> 
>>>> On the downside, it would make the on disk size of the Sentry plugins
>> and
>>>> libraries for downstream larger. As well, the number of classes loaded
>>> into
>>>> memory would be larger since there would be potential duplication of
>>> actual
>>>> class implementations or multiple versions of a class with different
>>>> package names in memory.  But this seems to be a common practice and
>> the
>>>> lack of library version collisions and stability make up for these
>>>> downsides.
>>>> 
>>>> The third party shading works by using the Maven shade plugin to do
>>> package
>>>> name shifting of the third party library (Guava, BoneCP) to a sentry
>>>> specific package using the version on the third party library needed
>> for
>>>> Sentry.
>>>> 
>>>> E.G. *com.google.common* packages could be shifted to
>>>> *org.apache.sentry.shaded.com.google.common*.
>>>> 
>>>> Since the Maven Shade plugin can actually change the byte-codes of the
>>>> libraries being shaded, we can do this even for dependencies that have
>>>> shared sub-dependencies .  BoneCP being the main example since it uses
>>>> Guava internal to itself.  We could shade the BoneCP into the same
>> shared
>>>> sentry third party dependency jar and since the bytecode level
>>>> manipulation.
>>>> 
>>>> What this means on the development side is that we would need to
>>> reference
>>>> imports from the shaded third party
>>>> 
>>>> E.G. *import com.google.common.collect.Maps* becomes *import
>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>>>> 
>>>> Ive been looking at this in the context of
>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
>>> should
>>>> potentially be something that is more of an overall Sentry standard
>>>> practice and larger scale implementation.
>>>> 
>>>> -=Brian
>>>> 
>>>> 
>>>> --
>>>> *Brian Towles* | Software Engineer
>>>> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
>>> jond@cloudera.com>
>>>> cloudera.com <http://www.cloudera.com/>
>>>> 
>>>> [image: Cloudera] <http://www.cloudera.com/>
>>>> 
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera
>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>> 
>>> 
>> 


Re: Shading of conflicting dependencies

Posted by Kalyan Kumar Kalvagadda <kk...@cloudera.com>.
+1

It's good approach but I have a question/concern.
Is the proposal to shade is for some specific jars OR to shade all the
third party jars?

If proposal is shade all the third-party jars then if would impact the run
time memory usage as all the classes from the third-party jars would be
loaded regardless..

-Kalyan

On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <se...@cloudera.com>
wrote:

> +1
>
> I like the idea. It's hard to upgrade our libraries to newer ones when
> other components break due to Sentry being a plugin. I was once thought of
> using different jars versions per plugin (i.e. guava11 on hdfs, guava14 on
> hive and the server, etc), but that is too much to do and not good. I like
> the hbase idea better so we can use the latest libraries that we want
> instead of depending on other components.
>
> - Sergio
>
>
>
> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <ak...@cloudera.com>
> wrote:
>
> > Agreed, this would be a very useful thing to do. I remember spending a
> lot
> > of time trying to make Sentry work with DataNucleus 4 - the problem was
> > that e2e tests combine Sentry with Hive in the same JVM and this created
> a
> > conflict on the DataNucleus libraries and test failures.
> >
> > Looking at the HBase proposal it seems to require some (small) code
> changes
> > for package names - or I am misreading it? Do you plan to shade all 3-rd
> > party packages or just some? I would consider at least
> > Guava/ZooKeepeer/DataNucleus/Jetty.
> >
> > - Alex
> >
> > On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com>
> > wrote:
> >
> > > Howdy all,
> > >
> > > An issue that keeps coming up seems to be the conflict of dependency
> > > versions between Sentry and the components it is plugging into. A
> current
> > > example of this impact is Google Guava with hive2 using v14 and Impala
> > > using v11 while Sentry needs to have at least v14 in order to fix some
> > bugs
> > > in the BoneCP library.  These sort of conflicts come in whenever we are
> > > embedding a Sentry plugin or using a Sentry library into another
> project.
> > >
> > > I would like to propose a mechanism for offsetting some of these issues
> > > using something similar to the third party shading that HBase uses for
> > > common problems (such as Guava)
> > > (https://github.com/apache/hbase-thirdparty with
> > > documentation about it http://hbase.apache.org/book.html#thirdparty)
> and
> > > is
> > > used by HBase and Hadoop for packaging of their downstream artifacts (
> > > https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > > https://github.com/apache/hadoop/tree/trunk/hadoop-
> > > client-modules/hadoop-client-api/
> > > )
> > >
> > > The main benefit of this is that it would allow Sentry to be used as
> > > libraries and plugins with all of the dependencies needed for Sentry to
> > be
> > > abstracted away from components implementing it. Sentry could rev
> > versions
> > > of libraries easier and not have collisions of library versions needed
> by
> > > the implementing component.  As well this would potentially make Sentry
> > > downstream usage more stable since it would be used and tested against
> a
> > > static set of dependencies and not using libraries based on what the
> > > implementing component has.
> > >
> > > On the downside, it would make the on disk size of the Sentry plugins
> and
> > > libraries for downstream larger. As well, the number of classes loaded
> > into
> > > memory would be larger since there would be potential duplication of
> > actual
> > > class implementations or multiple versions of a class with different
> > > package names in memory.  But this seems to be a common practice and
> the
> > > lack of library version collisions and stability make up for these
> > > downsides.
> > >
> > > The third party shading works by using the Maven shade plugin to do
> > package
> > > name shifting of the third party library (Guava, BoneCP) to a sentry
> > > specific package using the version on the third party library needed
> for
> > > Sentry.
> > >
> > > E.G. *com.google.common* packages could be shifted to
> > > *org.apache.sentry.shaded.com.google.common*.
> > >
> > > Since the Maven Shade plugin can actually change the byte-codes of the
> > > libraries being shaded, we can do this even for dependencies that have
> > > shared sub-dependencies .  BoneCP being the main example since it uses
> > > Guava internal to itself.  We could shade the BoneCP into the same
> shared
> > > sentry third party dependency jar and since the bytecode level
> > > manipulation.
> > >
> > > What this means on the development side is that we would need to
> > reference
> > > imports from the shaded third party
> > >
> > > E.G. *import com.google.common.collect.Maps* becomes *import
> > > org.apache.sentry.shaded.com.google.common.collect.Maps*.
> > >
> > > Ive been looking at this in the context of
> > > https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
> > should
> > > potentially be something that is more of an overall Sentry standard
> > > practice and larger scale implementation.
> > >
> > > -=Brian
> > >
> > >
> > > --
> > > *Brian Towles* | Software Engineer
> > > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> > jond@cloudera.com>
> > > cloudera.com <http://www.cloudera.com/>
> > >
> > > [image: Cloudera] <http://www.cloudera.com/>
> > >
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > >
> >
>

Re: Shading of conflicting dependencies

Posted by Na Li <li...@cloudera.com>.
+1

This gives sentry more flexibility without impacting other components using
sentry.

On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <se...@cloudera.com>
wrote:

> +1
>
> I like the idea. It's hard to upgrade our libraries to newer ones when
> other components break due to Sentry being a plugin. I was once thought of
> using different jars versions per plugin (i.e. guava11 on hdfs, guava14 on
> hive and the server, etc), but that is too much to do and not good. I like
> the hbase idea better so we can use the latest libraries that we want
> instead of depending on other components.
>
> - Sergio
>
>
>
> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <ak...@cloudera.com>
> wrote:
>
> > Agreed, this would be a very useful thing to do. I remember spending a
> lot
> > of time trying to make Sentry work with DataNucleus 4 - the problem was
> > that e2e tests combine Sentry with Hive in the same JVM and this created
> a
> > conflict on the DataNucleus libraries and test failures.
> >
> > Looking at the HBase proposal it seems to require some (small) code
> changes
> > for package names - or I am misreading it? Do you plan to shade all 3-rd
> > party packages or just some? I would consider at least
> > Guava/ZooKeepeer/DataNucleus/Jetty.
> >
> > - Alex
> >
> > On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com>
> > wrote:
> >
> > > Howdy all,
> > >
> > > An issue that keeps coming up seems to be the conflict of dependency
> > > versions between Sentry and the components it is plugging into. A
> current
> > > example of this impact is Google Guava with hive2 using v14 and Impala
> > > using v11 while Sentry needs to have at least v14 in order to fix some
> > bugs
> > > in the BoneCP library.  These sort of conflicts come in whenever we are
> > > embedding a Sentry plugin or using a Sentry library into another
> project.
> > >
> > > I would like to propose a mechanism for offsetting some of these issues
> > > using something similar to the third party shading that HBase uses for
> > > common problems (such as Guava)
> > > (https://github.com/apache/hbase-thirdparty with
> > > documentation about it http://hbase.apache.org/book.html#thirdparty)
> and
> > > is
> > > used by HBase and Hadoop for packaging of their downstream artifacts (
> > > https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > > https://github.com/apache/hadoop/tree/trunk/hadoop-
> > > client-modules/hadoop-client-api/
> > > )
> > >
> > > The main benefit of this is that it would allow Sentry to be used as
> > > libraries and plugins with all of the dependencies needed for Sentry to
> > be
> > > abstracted away from components implementing it. Sentry could rev
> > versions
> > > of libraries easier and not have collisions of library versions needed
> by
> > > the implementing component.  As well this would potentially make Sentry
> > > downstream usage more stable since it would be used and tested against
> a
> > > static set of dependencies and not using libraries based on what the
> > > implementing component has.
> > >
> > > On the downside, it would make the on disk size of the Sentry plugins
> and
> > > libraries for downstream larger. As well, the number of classes loaded
> > into
> > > memory would be larger since there would be potential duplication of
> > actual
> > > class implementations or multiple versions of a class with different
> > > package names in memory.  But this seems to be a common practice and
> the
> > > lack of library version collisions and stability make up for these
> > > downsides.
> > >
> > > The third party shading works by using the Maven shade plugin to do
> > package
> > > name shifting of the third party library (Guava, BoneCP) to a sentry
> > > specific package using the version on the third party library needed
> for
> > > Sentry.
> > >
> > > E.G. *com.google.common* packages could be shifted to
> > > *org.apache.sentry.shaded.com.google.common*.
> > >
> > > Since the Maven Shade plugin can actually change the byte-codes of the
> > > libraries being shaded, we can do this even for dependencies that have
> > > shared sub-dependencies .  BoneCP being the main example since it uses
> > > Guava internal to itself.  We could shade the BoneCP into the same
> shared
> > > sentry third party dependency jar and since the bytecode level
> > > manipulation.
> > >
> > > What this means on the development side is that we would need to
> > reference
> > > imports from the shaded third party
> > >
> > > E.G. *import com.google.common.collect.Maps* becomes *import
> > > org.apache.sentry.shaded.com.google.common.collect.Maps*.
> > >
> > > Ive been looking at this in the context of
> > > https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
> > should
> > > potentially be something that is more of an overall Sentry standard
> > > practice and larger scale implementation.
> > >
> > > -=Brian
> > >
> > >
> > > --
> > > *Brian Towles* | Software Engineer
> > > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> > jond@cloudera.com>
> > > cloudera.com <http://www.cloudera.com/>
> > >
> > > [image: Cloudera] <http://www.cloudera.com/>
> > >
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > >
> >
>

Re: Shading of conflicting dependencies

Posted by Sergio Pena <se...@cloudera.com>.
+1

I like the idea. It's hard to upgrade our libraries to newer ones when
other components break due to Sentry being a plugin. I was once thought of
using different jars versions per plugin (i.e. guava11 on hdfs, guava14 on
hive and the server, etc), but that is too much to do and not good. I like
the hbase idea better so we can use the latest libraries that we want
instead of depending on other components.

- Sergio



On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <ak...@cloudera.com>
wrote:

> Agreed, this would be a very useful thing to do. I remember spending a lot
> of time trying to make Sentry work with DataNucleus 4 - the problem was
> that e2e tests combine Sentry with Hive in the same JVM and this created a
> conflict on the DataNucleus libraries and test failures.
>
> Looking at the HBase proposal it seems to require some (small) code changes
> for package names - or I am misreading it? Do you plan to shade all 3-rd
> party packages or just some? I would consider at least
> Guava/ZooKeepeer/DataNucleus/Jetty.
>
> - Alex
>
> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com>
> wrote:
>
> > Howdy all,
> >
> > An issue that keeps coming up seems to be the conflict of dependency
> > versions between Sentry and the components it is plugging into. A current
> > example of this impact is Google Guava with hive2 using v14 and Impala
> > using v11 while Sentry needs to have at least v14 in order to fix some
> bugs
> > in the BoneCP library.  These sort of conflicts come in whenever we are
> > embedding a Sentry plugin or using a Sentry library into another project.
> >
> > I would like to propose a mechanism for offsetting some of these issues
> > using something similar to the third party shading that HBase uses for
> > common problems (such as Guava)
> > (https://github.com/apache/hbase-thirdparty with
> > documentation about it http://hbase.apache.org/book.html#thirdparty) and
> > is
> > used by HBase and Hadoop for packaging of their downstream artifacts (
> > https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > https://github.com/apache/hadoop/tree/trunk/hadoop-
> > client-modules/hadoop-client-api/
> > )
> >
> > The main benefit of this is that it would allow Sentry to be used as
> > libraries and plugins with all of the dependencies needed for Sentry to
> be
> > abstracted away from components implementing it. Sentry could rev
> versions
> > of libraries easier and not have collisions of library versions needed by
> > the implementing component.  As well this would potentially make Sentry
> > downstream usage more stable since it would be used and tested against a
> > static set of dependencies and not using libraries based on what the
> > implementing component has.
> >
> > On the downside, it would make the on disk size of the Sentry plugins and
> > libraries for downstream larger. As well, the number of classes loaded
> into
> > memory would be larger since there would be potential duplication of
> actual
> > class implementations or multiple versions of a class with different
> > package names in memory.  But this seems to be a common practice and the
> > lack of library version collisions and stability make up for these
> > downsides.
> >
> > The third party shading works by using the Maven shade plugin to do
> package
> > name shifting of the third party library (Guava, BoneCP) to a sentry
> > specific package using the version on the third party library needed for
> > Sentry.
> >
> > E.G. *com.google.common* packages could be shifted to
> > *org.apache.sentry.shaded.com.google.common*.
> >
> > Since the Maven Shade plugin can actually change the byte-codes of the
> > libraries being shaded, we can do this even for dependencies that have
> > shared sub-dependencies .  BoneCP being the main example since it uses
> > Guava internal to itself.  We could shade the BoneCP into the same shared
> > sentry third party dependency jar and since the bytecode level
> > manipulation.
> >
> > What this means on the development side is that we would need to
> reference
> > imports from the shaded third party
> >
> > E.G. *import com.google.common.collect.Maps* becomes *import
> > org.apache.sentry.shaded.com.google.common.collect.Maps*.
> >
> > Ive been looking at this in the context of
> > https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
> should
> > potentially be something that is more of an overall Sentry standard
> > practice and larger scale implementation.
> >
> > -=Brian
> >
> >
> > --
> > *Brian Towles* | Software Engineer
> > t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <
> jond@cloudera.com>
> > cloudera.com <http://www.cloudera.com/>
> >
> > [image: Cloudera] <http://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> >
>

Re: Shading of conflicting dependencies

Posted by Alexander Kolbasov <ak...@cloudera.com>.
Agreed, this would be a very useful thing to do. I remember spending a lot
of time trying to make Sentry work with DataNucleus 4 - the problem was
that e2e tests combine Sentry with Hive in the same JVM and this created a
conflict on the DataNucleus libraries and test failures.

Looking at the HBase proposal it seems to require some (small) code changes
for package names - or I am misreading it? Do you plan to shade all 3-rd
party packages or just some? I would consider at least
Guava/ZooKeepeer/DataNucleus/Jetty.

- Alex

On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <bt...@cloudera.com> wrote:

> Howdy all,
>
> An issue that keeps coming up seems to be the conflict of dependency
> versions between Sentry and the components it is plugging into. A current
> example of this impact is Google Guava with hive2 using v14 and Impala
> using v11 while Sentry needs to have at least v14 in order to fix some bugs
> in the BoneCP library.  These sort of conflicts come in whenever we are
> embedding a Sentry plugin or using a Sentry library into another project.
>
> I would like to propose a mechanism for offsetting some of these issues
> using something similar to the third party shading that HBase uses for
> common problems (such as Guava)
> (https://github.com/apache/hbase-thirdparty with
> documentation about it http://hbase.apache.org/book.html#thirdparty) and
> is
> used by HBase and Hadoop for packaging of their downstream artifacts (
> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> https://github.com/apache/hadoop/tree/trunk/hadoop-
> client-modules/hadoop-client-api/
> )
>
> The main benefit of this is that it would allow Sentry to be used as
> libraries and plugins with all of the dependencies needed for Sentry to be
> abstracted away from components implementing it. Sentry could rev versions
> of libraries easier and not have collisions of library versions needed by
> the implementing component.  As well this would potentially make Sentry
> downstream usage more stable since it would be used and tested against a
> static set of dependencies and not using libraries based on what the
> implementing component has.
>
> On the downside, it would make the on disk size of the Sentry plugins and
> libraries for downstream larger. As well, the number of classes loaded into
> memory would be larger since there would be potential duplication of actual
> class implementations or multiple versions of a class with different
> package names in memory.  But this seems to be a common practice and the
> lack of library version collisions and stability make up for these
> downsides.
>
> The third party shading works by using the Maven shade plugin to do package
> name shifting of the third party library (Guava, BoneCP) to a sentry
> specific package using the version on the third party library needed for
> Sentry.
>
> E.G. *com.google.common* packages could be shifted to
> *org.apache.sentry.shaded.com.google.common*.
>
> Since the Maven Shade plugin can actually change the byte-codes of the
> libraries being shaded, we can do this even for dependencies that have
> shared sub-dependencies .  BoneCP being the main example since it uses
> Guava internal to itself.  We could shade the BoneCP into the same shared
> sentry third party dependency jar and since the bytecode level
> manipulation.
>
> What this means on the development side is that we would need to reference
> imports from the shaded third party
>
> E.G. *import com.google.common.collect.Maps* becomes *import
> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>
> Ive been looking at this in the context of
> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this should
> potentially be something that is more of an overall Sentry standard
> practice and larger scale implementation.
>
> -=Brian
>
>
> --
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. btowles@cloudera.com <jo...@cloudera.com>
> cloudera.com <http://www.cloudera.com/>
>
> [image: Cloudera] <http://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>