You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Stamatis Zampetakis <za...@gmail.com> on 2021/09/06 10:48:22 UTC

Re: hive-exec vs. hive-exec:core

Hi Dan,

Thanks for kicking off this discussion and taking the time to propose
solutions.

As you correctly mentioned the recommendation from the Hive team is to
always use the hive-exec.jar (dependency) and never rely on
hive-exec-core.jar.

Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically relocates
shaded deps do you think there will still be binary incompatibility issues?

If the relocating solution works, I would personally prefer going down this
path instead of introducing an entirely new module just for the sake of
dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies and
people are more or less accustomed with this route.

Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi <fd...@cloudera.com.invalid>
wrote:

> Dear Hive developers,
>
> I am Dan from the Oozie team and I would like to bring up the
> hive-exec.jar vs. hive-exec-core.jar topic.
> The reason for that is because as far as we understand the official
> recommendation from the Hive team is to use the hive-exec.jar artifact.
>
> However in Oozie that can end-up in a binary incompatibility.
>
> The reason for that is:
>
>   * Let's say library A is included in the fat Jar.
>
>   * And library B which is using library A is also included in the fat Jar.
>
>   * Let's also say that library A's com.library.alib package is
>     relocated to org.apache.hive.com.library.alib,
>     meaning the com.library.alib.SomeClass becomes
>     org.apache.hive.com.library.alib.SomeClass
>
>   * So if B has a method like public void
>     someMethod(com.library.alib.SomeClass) then the signature of this
>     method will be changed to:
>     public void someMethod(org.apache.hive.com.library.alib.SomeClass)
>
>   * If Oozie is also using B directly meaning we'll have b.jar on our
>     classpath, but with the unchanged signature,
>     so when hive-exec tries to invoke someMethod then depending on
>     whether b.jar coming from us will be loaded first or hive-exec will,
>     we can end-up with a NoSuchMethodError is hive-exec tries to pass an
>     org.apache.hive.com.library.alib.SomeClass instance to the
>     someMethod which was loaded from the original b.jar.
>
> Hence in Oozie a long time ago (OOZIE-2621
> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> made to use the hive-exec-core Jar.
>
> Now since the shading process actually removes those dependencies from
> the hive-exec pom which are included in the fat Jar, we manually had to
> add some dependencies to Oozie to compensate this.
> However these dependencies are not used by Oozie directly and with the
> growing features of hive-exec we had to repeat the same process
> over-and-over which is a bit unmaintainable.
>
> Today I'm writing to you to propose a long-term solution where basically
> nothing would change in the generated hive artifacts, poms and the same
> time we wouldn't have to manually declare dependencies in Oozie which
> are not explicitly used by us.
>
> The solution:
>
>  1. We would create a new module named hive-exec-dependencies which
>     would be a pom-packaging module without any Java source files.
>  2. All the dependencies declared in hive-exec would be moved to
>     hive-exec-dependencies.
>  3. We would make the hive-exec-dependencies module the parent of
>     hive-exec and with this hive-exec would still have access to the
>     same dependencies as before.
>  4. The maven shade plugin would still strip the dependencies from the
>     generated hive-exec pom which are included in the fat Jar.
>  5. And with a small maven plugin we'd change hive-exec's parent back
>     from hive-exec-dependencies to the root hive project in the
>     generated hive-exec pom file.
>
> I have a change ready locally and it works as described above.
>
> With this on the Oozie side we could add a dependency on
> hive-exec-dependencies and hence all the required libraries which are
> included in the fat Jar would be pulled into Oozie.
> The next time a new dependency would be added to hive-exec-dependencies,
> the Oozie build would pull it in automatically without us having to
> explicitly declare it.
>
> Please let me know what you think.
>
> Best,
> Dan
>

Re: hive-exec vs. hive-exec:core

Posted by Chao Sun <su...@apache.org>.
I'm fine as long as we are committed to fixing the shading problems before
the release. Ideally I think we should fix the shading problems first and
then remove the hive-exec:core jar though (which is I said it's a bit
premature to do it now).

On Thu, Nov 18, 2021 at 8:28 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hello,
>
> I don't see any risk committing this right now in master. It will only
> affect the new Hive release when and if it ever goes out.
> Till then we have plenty of time to fix shading problems and help other
> projects migrate to the "recommended" way to use Hive.
>
> Moreover, I don't know many projects relying on this kind of "double" (core
> vs. fat) publication of dependencies. For Hive, it creates additional
> maintenance cost and for its users confusion on what they should use.
> If for whatever reason, another project does not want to include everything
> coming in the fat jar, maven provides ways to do it. I wouldn't recommend
> going down this path but there are alternatives.
>
> Best,
> Stamatis
>
> On Wed, Nov 17, 2021 at 8:15 PM Zoltan Haindrich <ki...@rxd.hu> wrote:
>
> >
> >
> > On 11/17/21 7:46 PM, Chao Sun wrote:
> > >> We have a working hive-exec jar
> > >
> > > I'm not sure about this. The issue comes when the fat hive-exec jar
> > shades
> > > some jars but doesn't relocate them. In this case there is no way for
> the
> > > downstream projects to resolve the conflict.
> >
> > Exactly - I think those should be hammered out for good; fix the
> > shading/relocation!
> >
> > >
> > > On the Spark side IIUC we had issues with Apache Commons as well as ORC
> > > (see HIVE-25317 for an effort on this), and there could be more. Spark
> is
> > > using Hive 2.3 though but the same applies for master/4.0 if dependency
> > > versions differ between Hive and the downstream projects.
> >
> > This change is only about master - it won't change Hive 2.3. HIVE-25317
> > was for branch-2 as well.
> > I've seen a few places wierd stuff because they were not able to use the
> > hive-exec jar as-is.
> > Folks in the Impala project for example went in a direction to
> > re-shade/re-filter the hive-exec jar and relocate some stuff in it - most
> > likely because it conflicted with
> > their stuff.
> >
> >
> https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml
> > Taking a quick look at https://github.com/apache/spark/pull/33989/files
> > it seems like you've also done something similar....but instead of using
> > the base artifact; you have
> > created a new shader.
> > I don't think this better than having an artifact which is simply works
> > out-of-the-box.
> >
> >
> > cheers,
> > Zoltan
> >
> > >
> > > On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> > >
> > >> On 11/17/21 7:07 PM, Daniel Fritsi wrote:
> > >>> For Oozie we've decided to use fat Jar downstream (Cloudera) as there
> > we
> > >> have processes to ensure 3rd-party library versions are kept in sync.
> > >>>
> > >>> Since we don't have such a process in Apache, there we'll continue to
> > >> use the core Jar.
> > >>
> > >> It might be possible to evade some problems by using a 3rd party lib
> > >> syncer - but if we've done a good job shading this stuff; it should
> not
> > >> cause any trouble even in case
> > >> other 3rd party stuff is present....but in any case to check things
> out
> > >> you will need a Hive release in some form
> > >>
> > >> cheers,
> > >> Zoltan
> > >>
> > >>>
> > >>> Dan
> > >>>
> > >>> On 2021. 11. 17. 18:50, Chao Sun wrote:
> > >>>>> the idea is to fix the issues they bump into - because people who
> > load
> > >>>> the jdbc driver may also see those issues.
> > >>>>
> > >>>> I don’t get what you mean here, could you elaborate a bit more?
> > >>>>
> > >>>> IMO it's a bit premature to do this without a working hive-exec jar
> > for
> > >>>> downstream projects like Spark/Trino/Presto. At the current state
> > there
> > >> is
> > >>>> no way to upgrade these projects to use the fat hive-exec jar.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<ki...@rxd.hu>
> wrote:
> > >>>>
> > >>>>> Hey all,
> > >>>>>
> > >>>>> I wanted to get back to this - but had other things going on.
> > >>>>>
> > >>>>> Chao> it is still being used today by some other popular projects
> > >>>>> the idea is to fix the issues they bump into - because people who
> > load
> > >> the
> > >>>>> jdbc driver may also see those issues.
> > >>>>>
> > >>>>> Edward> [...] You all must like enjoy shading jars.
> > >>>>> I totally agree that they may use a shell action as well.
> > >>>>> I wonder how do you propose to solve issues related to clients
> using
> > a
> > >>>>> different version of the guava library?
> > >>>>>
> > >>>>> The changes which will remove the core artifact stuff is ready:
> > >>>>> https://github.com/apache/hive/pull/2648
> > >>>>>
> > >>>>> cheers,
> > >>>>> Zoltan
> > >>>>>
> > >>>>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> > >>>>>> recommendation from the Hive team is to use the hive-exec.jar
> > >> artifact.
> > >>>>>>
> > >>>>>> You know about 10 years ago. I mentioned that oozie should just
> use
> > >>>>>> hive-service or hive jdbc. After a big fight where folks kept
> > >> bringing up
> > >>>>>> concurrency bugs in hive-server-1 my prs were rejected (even
> though
> > >> hive
> > >>>>>> server2 would not have these bugs). I still cannot fathom why
> > someone
> > >>>>> using
> > >>>>>> oozie would want a fat jar of hive (as opposed to hive server or
> > >>>>> hivejdbc)
> > >>>>>> . If I had to do that, i would just use shell action..... You all
> > must
> > >>>>> like
> > >>>>>> enjoy shading jars.
> > >>>>>>
> > >>>>>> Edward
> > >>>>>>
> > >>>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<su...@apache.org>
> > wrote:
> > >>>>>>
> > >>>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
> > >>>>>>> completely - it is still being used today by some other popular
> > >> projects
> > >>>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core`
> > it
> > >>>>> gives
> > >>>>>>> more flexibility to the other projects to shade & relocate those
> > >> classes
> > >>>>>>> according to their need, without waiting for new Hive releases.
> > Hive
> > >>>>> also
> > >>>>>>> needs to make sure it relocate everything properly. Otherwise, if
> > >> some
> > >>>>>>> classes are shaded & included in `hive-exec` but not relocated,
> > there
> > >>>>> is no
> > >>>>>>> way for the other projects to exclude them and avoid potential
> > >>>>> conflicts.
> > >>>>>>> Chao
> > >>>>>>>
> > >>>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<ki...@rxd.hu>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Hey
> > >>>>>>>>
> > >>>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> > >>>>>>>>> Indeed this may lead to binary incompatibility problems as the
> > one
> > >> you
> > >>>>>>>>> mentioned. If I understood correctly the problem you cite comes
> > up
> > >> if
> > >>>>>>>>> library B in this case is not relocated. If Hive systematically
> > >>>>>>> relocates
> > >>>>>>>>> shaded deps do you think there will still be binary
> > incompatibility
> > >>>>>>>> issues?
> > >>>>>>>>> If the relocating solution works, I would personally prefer
> going
> > >> down
> > >>>>>>>> this
> > >>>>>>>>> path instead of introducing an entirely new module just for the
> > >> sake
> > >>>>> of
> > >>>>>>>>> dependency management. Most of the time when there are problems
> > >> with
> > >>>>>>>>> shading the answer comes from relocating the problematic
> > >> dependencies
> > >>>>>>> and
> > >>>>>>>>> people are more or less accustomed with this route.
> > >>>>>>>> I totally agree with you Stamatis - with the addition that we
> > should
> > >>>>> work
> > >>>>>>>> together with the owners of other projects to help them use the
> > >> correct
> > >>>>>>>> artifact to gain access to
> > >>>>>>>> Hive's internal parts.
> > >>>>>>>> I've opened HIVE-25531 to remove the core classified artifact -
> > and
> > >>>>>>> ensure
> > >>>>>>>> that we will be uncovering and fixing future issues with the
> > >> hive-exec
> > >>>>>>>> artifact.
> > >>>>>>>>
> > >>>>>>>> cheers,
> > >>>>>>>> Zoltan
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Stamatis
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> > >>>>>>>> <fd...@cloudera.com.invalid>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Dear Hive developers,
> > >>>>>>>>>>
> > >>>>>>>>>> I am Dan from the Oozie team and I would like to bring up the
> > >>>>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
> > >>>>>>>>>> The reason for that is because as far as we understand the
> > >> official
> > >>>>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
> > >>>>>>> artifact.
> > >>>>>>>>>> However in Oozie that can end-up in a binary incompatibility.
> > >>>>>>>>>>
> > >>>>>>>>>> The reason for that is:
> > >>>>>>>>>>
> > >>>>>>>>>>       * Let's say library A is included in the fat Jar.
> > >>>>>>>>>>
> > >>>>>>>>>>       * And library B which is using library A is also
> included
> > in
> > >> the
> > >>>>>>> fat
> > >>>>>>>> Jar.
> > >>>>>>>>>>       * Let's also say that library A's com.library.alib
> > package is
> > >>>>>>>>>>         relocated to org.apache.hive.com.library.alib,
> > >>>>>>>>>>         meaning the com.library.alib.SomeClass becomes
> > >>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass
> > >>>>>>>>>>
> > >>>>>>>>>>       * So if B has a method like public void
> > >>>>>>>>>>         someMethod(com.library.alib.SomeClass) then the
> > signature
> > >> of
> > >>>>> this
> > >>>>>>>>>>         method will be changed to:
> > >>>>>>>>>>         public void
> > >>>>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
> > >>>>>>>>>>       * If Oozie is also using B directly meaning we'll have
> > b.jar
> > >> on
> > >>>>> our
> > >>>>>>>>>>         classpath, but with the unchanged signature,
> > >>>>>>>>>>         so when hive-exec tries to invoke someMethod then
> > >> depending on
> > >>>>>>>>>>         whether b.jar coming from us will be loaded first or
> > >> hive-exec
> > >>>>>>>> will,
> > >>>>>>>>>>         we can end-up with a NoSuchMethodError is hive-exec
> > tries
> > >> to
> > >>>>> pass
> > >>>>>>>> an
> > >>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass instance to
> > the
> > >>>>>>>>>>         someMethod which was loaded from the original b.jar.
> > >>>>>>>>>>
> > >>>>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
> > >>>>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the
> > decision
> > >> was
> > >>>>>>>>>> made to use the hive-exec-core Jar.
> > >>>>>>>>>>
> > >>>>>>>>>> Now since the shading process actually removes those
> > dependencies
> > >>>>> from
> > >>>>>>>>>> the hive-exec pom which are included in the fat Jar, we
> manually
> > >> had
> > >>>>>>> to
> > >>>>>>>>>> add some dependencies to Oozie to compensate this.
> > >>>>>>>>>> However these dependencies are not used by Oozie directly and
> > with
> > >>>>> the
> > >>>>>>>>>> growing features of hive-exec we had to repeat the same
> process
> > >>>>>>>>>> over-and-over which is a bit unmaintainable.
> > >>>>>>>>>>
> > >>>>>>>>>> Today I'm writing to you to propose a long-term solution where
> > >>>>>>> basically
> > >>>>>>>>>> nothing would change in the generated hive artifacts, poms and
> > the
> > >>>>>>> same
> > >>>>>>>>>> time we wouldn't have to manually declare dependencies in
> Oozie
> > >> which
> > >>>>>>>>>> are not explicitly used by us.
> > >>>>>>>>>>
> > >>>>>>>>>> The solution:
> > >>>>>>>>>>
> > >>>>>>>>>>      1. We would create a new module named
> > hive-exec-dependencies
> > >> which
> > >>>>>>>>>>         would be a pom-packaging module without any Java
> source
> > >> files.
> > >>>>>>>>>>      2. All the dependencies declared in hive-exec would be
> > moved
> > >> to
> > >>>>>>>>>>         hive-exec-dependencies.
> > >>>>>>>>>>      3. We would make the hive-exec-dependencies module the
> > parent
> > >> of
> > >>>>>>>>>>         hive-exec and with this hive-exec would still have
> > access
> > >> to
> > >>>>> the
> > >>>>>>>>>>         same dependencies as before.
> > >>>>>>>>>>      4. The maven shade plugin would still strip the
> > dependencies
> > >> from
> > >>>>>>> the
> > >>>>>>>>>>         generated hive-exec pom which are included in the fat
> > Jar.
> > >>>>>>>>>>      5. And with a small maven plugin we'd change hive-exec's
> > >> parent
> > >>>>> back
> > >>>>>>>>>>         from hive-exec-dependencies to the root hive project
> in
> > the
> > >>>>>>>>>>         generated hive-exec pom file.
> > >>>>>>>>>>
> > >>>>>>>>>> I have a change ready locally and it works as described above.
> > >>>>>>>>>>
> > >>>>>>>>>> With this on the Oozie side we could add a dependency on
> > >>>>>>>>>> hive-exec-dependencies and hence all the required libraries
> > which
> > >> are
> > >>>>>>>>>> included in the fat Jar would be pulled into Oozie.
> > >>>>>>>>>> The next time a new dependency would be added to
> > >>>>>>> hive-exec-dependencies,
> > >>>>>>>>>> the Oozie build would pull it in automatically without us
> having
> > >> to
> > >>>>>>>>>> explicitly declare it.
> > >>>>>>>>>>
> > >>>>>>>>>> Please let me know what you think.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Dan
> > >>>>>>>>>>
> > >>>
> > >>
> > >
> >
>

Re: hive-exec vs. hive-exec:core

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

I don't see any risk committing this right now in master. It will only
affect the new Hive release when and if it ever goes out.
Till then we have plenty of time to fix shading problems and help other
projects migrate to the "recommended" way to use Hive.

Moreover, I don't know many projects relying on this kind of "double" (core
vs. fat) publication of dependencies. For Hive, it creates additional
maintenance cost and for its users confusion on what they should use.
If for whatever reason, another project does not want to include everything
coming in the fat jar, maven provides ways to do it. I wouldn't recommend
going down this path but there are alternatives.

Best,
Stamatis

On Wed, Nov 17, 2021 at 8:15 PM Zoltan Haindrich <ki...@rxd.hu> wrote:

>
>
> On 11/17/21 7:46 PM, Chao Sun wrote:
> >> We have a working hive-exec jar
> >
> > I'm not sure about this. The issue comes when the fat hive-exec jar
> shades
> > some jars but doesn't relocate them. In this case there is no way for the
> > downstream projects to resolve the conflict.
>
> Exactly - I think those should be hammered out for good; fix the
> shading/relocation!
>
> >
> > On the Spark side IIUC we had issues with Apache Commons as well as ORC
> > (see HIVE-25317 for an effort on this), and there could be more. Spark is
> > using Hive 2.3 though but the same applies for master/4.0 if dependency
> > versions differ between Hive and the downstream projects.
>
> This change is only about master - it won't change Hive 2.3. HIVE-25317
> was for branch-2 as well.
> I've seen a few places wierd stuff because they were not able to use the
> hive-exec jar as-is.
> Folks in the Impala project for example went in a direction to
> re-shade/re-filter the hive-exec jar and relocate some stuff in it - most
> likely because it conflicted with
> their stuff.
>
> https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml
> Taking a quick look at https://github.com/apache/spark/pull/33989/files
> it seems like you've also done something similar....but instead of using
> the base artifact; you have
> created a new shader.
> I don't think this better than having an artifact which is simply works
> out-of-the-box.
>
>
> cheers,
> Zoltan
>
> >
> > On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >
> >> On 11/17/21 7:07 PM, Daniel Fritsi wrote:
> >>> For Oozie we've decided to use fat Jar downstream (Cloudera) as there
> we
> >> have processes to ensure 3rd-party library versions are kept in sync.
> >>>
> >>> Since we don't have such a process in Apache, there we'll continue to
> >> use the core Jar.
> >>
> >> It might be possible to evade some problems by using a 3rd party lib
> >> syncer - but if we've done a good job shading this stuff; it should not
> >> cause any trouble even in case
> >> other 3rd party stuff is present....but in any case to check things out
> >> you will need a Hive release in some form
> >>
> >> cheers,
> >> Zoltan
> >>
> >>>
> >>> Dan
> >>>
> >>> On 2021. 11. 17. 18:50, Chao Sun wrote:
> >>>>> the idea is to fix the issues they bump into - because people who
> load
> >>>> the jdbc driver may also see those issues.
> >>>>
> >>>> I don’t get what you mean here, could you elaborate a bit more?
> >>>>
> >>>> IMO it's a bit premature to do this without a working hive-exec jar
> for
> >>>> downstream projects like Spark/Trino/Presto. At the current state
> there
> >> is
> >>>> no way to upgrade these projects to use the fat hive-exec jar.
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
> >>>>
> >>>>> Hey all,
> >>>>>
> >>>>> I wanted to get back to this - but had other things going on.
> >>>>>
> >>>>> Chao> it is still being used today by some other popular projects
> >>>>> the idea is to fix the issues they bump into - because people who
> load
> >> the
> >>>>> jdbc driver may also see those issues.
> >>>>>
> >>>>> Edward> [...] You all must like enjoy shading jars.
> >>>>> I totally agree that they may use a shell action as well.
> >>>>> I wonder how do you propose to solve issues related to clients using
> a
> >>>>> different version of the guava library?
> >>>>>
> >>>>> The changes which will remove the core artifact stuff is ready:
> >>>>> https://github.com/apache/hive/pull/2648
> >>>>>
> >>>>> cheers,
> >>>>> Zoltan
> >>>>>
> >>>>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> >>>>>> recommendation from the Hive team is to use the hive-exec.jar
> >> artifact.
> >>>>>>
> >>>>>> You know about 10 years ago. I mentioned that oozie should just use
> >>>>>> hive-service or hive jdbc. After a big fight where folks kept
> >> bringing up
> >>>>>> concurrency bugs in hive-server-1 my prs were rejected (even though
> >> hive
> >>>>>> server2 would not have these bugs). I still cannot fathom why
> someone
> >>>>> using
> >>>>>> oozie would want a fat jar of hive (as opposed to hive server or
> >>>>> hivejdbc)
> >>>>>> . If I had to do that, i would just use shell action..... You all
> must
> >>>>> like
> >>>>>> enjoy shading jars.
> >>>>>>
> >>>>>> Edward
> >>>>>>
> >>>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<su...@apache.org>
> wrote:
> >>>>>>
> >>>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
> >>>>>>> completely - it is still being used today by some other popular
> >> projects
> >>>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core`
> it
> >>>>> gives
> >>>>>>> more flexibility to the other projects to shade & relocate those
> >> classes
> >>>>>>> according to their need, without waiting for new Hive releases.
> Hive
> >>>>> also
> >>>>>>> needs to make sure it relocate everything properly. Otherwise, if
> >> some
> >>>>>>> classes are shaded & included in `hive-exec` but not relocated,
> there
> >>>>> is no
> >>>>>>> way for the other projects to exclude them and avoid potential
> >>>>> conflicts.
> >>>>>>> Chao
> >>>>>>>
> >>>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<ki...@rxd.hu>
> >> wrote:
> >>>>>>>
> >>>>>>>> Hey
> >>>>>>>>
> >>>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> >>>>>>>>> Indeed this may lead to binary incompatibility problems as the
> one
> >> you
> >>>>>>>>> mentioned. If I understood correctly the problem you cite comes
> up
> >> if
> >>>>>>>>> library B in this case is not relocated. If Hive systematically
> >>>>>>> relocates
> >>>>>>>>> shaded deps do you think there will still be binary
> incompatibility
> >>>>>>>> issues?
> >>>>>>>>> If the relocating solution works, I would personally prefer going
> >> down
> >>>>>>>> this
> >>>>>>>>> path instead of introducing an entirely new module just for the
> >> sake
> >>>>> of
> >>>>>>>>> dependency management. Most of the time when there are problems
> >> with
> >>>>>>>>> shading the answer comes from relocating the problematic
> >> dependencies
> >>>>>>> and
> >>>>>>>>> people are more or less accustomed with this route.
> >>>>>>>> I totally agree with you Stamatis - with the addition that we
> should
> >>>>> work
> >>>>>>>> together with the owners of other projects to help them use the
> >> correct
> >>>>>>>> artifact to gain access to
> >>>>>>>> Hive's internal parts.
> >>>>>>>> I've opened HIVE-25531 to remove the core classified artifact -
> and
> >>>>>>> ensure
> >>>>>>>> that we will be uncovering and fixing future issues with the
> >> hive-exec
> >>>>>>>> artifact.
> >>>>>>>>
> >>>>>>>> cheers,
> >>>>>>>> Zoltan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Stamatis
> >>>>>>>>>
> >>>>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> >>>>>>>> <fd...@cloudera.com.invalid>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Dear Hive developers,
> >>>>>>>>>>
> >>>>>>>>>> I am Dan from the Oozie team and I would like to bring up the
> >>>>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
> >>>>>>>>>> The reason for that is because as far as we understand the
> >> official
> >>>>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
> >>>>>>> artifact.
> >>>>>>>>>> However in Oozie that can end-up in a binary incompatibility.
> >>>>>>>>>>
> >>>>>>>>>> The reason for that is:
> >>>>>>>>>>
> >>>>>>>>>>       * Let's say library A is included in the fat Jar.
> >>>>>>>>>>
> >>>>>>>>>>       * And library B which is using library A is also included
> in
> >> the
> >>>>>>> fat
> >>>>>>>> Jar.
> >>>>>>>>>>       * Let's also say that library A's com.library.alib
> package is
> >>>>>>>>>>         relocated to org.apache.hive.com.library.alib,
> >>>>>>>>>>         meaning the com.library.alib.SomeClass becomes
> >>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass
> >>>>>>>>>>
> >>>>>>>>>>       * So if B has a method like public void
> >>>>>>>>>>         someMethod(com.library.alib.SomeClass) then the
> signature
> >> of
> >>>>> this
> >>>>>>>>>>         method will be changed to:
> >>>>>>>>>>         public void
> >>>>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
> >>>>>>>>>>       * If Oozie is also using B directly meaning we'll have
> b.jar
> >> on
> >>>>> our
> >>>>>>>>>>         classpath, but with the unchanged signature,
> >>>>>>>>>>         so when hive-exec tries to invoke someMethod then
> >> depending on
> >>>>>>>>>>         whether b.jar coming from us will be loaded first or
> >> hive-exec
> >>>>>>>> will,
> >>>>>>>>>>         we can end-up with a NoSuchMethodError is hive-exec
> tries
> >> to
> >>>>> pass
> >>>>>>>> an
> >>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass instance to
> the
> >>>>>>>>>>         someMethod which was loaded from the original b.jar.
> >>>>>>>>>>
> >>>>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
> >>>>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the
> decision
> >> was
> >>>>>>>>>> made to use the hive-exec-core Jar.
> >>>>>>>>>>
> >>>>>>>>>> Now since the shading process actually removes those
> dependencies
> >>>>> from
> >>>>>>>>>> the hive-exec pom which are included in the fat Jar, we manually
> >> had
> >>>>>>> to
> >>>>>>>>>> add some dependencies to Oozie to compensate this.
> >>>>>>>>>> However these dependencies are not used by Oozie directly and
> with
> >>>>> the
> >>>>>>>>>> growing features of hive-exec we had to repeat the same process
> >>>>>>>>>> over-and-over which is a bit unmaintainable.
> >>>>>>>>>>
> >>>>>>>>>> Today I'm writing to you to propose a long-term solution where
> >>>>>>> basically
> >>>>>>>>>> nothing would change in the generated hive artifacts, poms and
> the
> >>>>>>> same
> >>>>>>>>>> time we wouldn't have to manually declare dependencies in Oozie
> >> which
> >>>>>>>>>> are not explicitly used by us.
> >>>>>>>>>>
> >>>>>>>>>> The solution:
> >>>>>>>>>>
> >>>>>>>>>>      1. We would create a new module named
> hive-exec-dependencies
> >> which
> >>>>>>>>>>         would be a pom-packaging module without any Java source
> >> files.
> >>>>>>>>>>      2. All the dependencies declared in hive-exec would be
> moved
> >> to
> >>>>>>>>>>         hive-exec-dependencies.
> >>>>>>>>>>      3. We would make the hive-exec-dependencies module the
> parent
> >> of
> >>>>>>>>>>         hive-exec and with this hive-exec would still have
> access
> >> to
> >>>>> the
> >>>>>>>>>>         same dependencies as before.
> >>>>>>>>>>      4. The maven shade plugin would still strip the
> dependencies
> >> from
> >>>>>>> the
> >>>>>>>>>>         generated hive-exec pom which are included in the fat
> Jar.
> >>>>>>>>>>      5. And with a small maven plugin we'd change hive-exec's
> >> parent
> >>>>> back
> >>>>>>>>>>         from hive-exec-dependencies to the root hive project in
> the
> >>>>>>>>>>         generated hive-exec pom file.
> >>>>>>>>>>
> >>>>>>>>>> I have a change ready locally and it works as described above.
> >>>>>>>>>>
> >>>>>>>>>> With this on the Oozie side we could add a dependency on
> >>>>>>>>>> hive-exec-dependencies and hence all the required libraries
> which
> >> are
> >>>>>>>>>> included in the fat Jar would be pulled into Oozie.
> >>>>>>>>>> The next time a new dependency would be added to
> >>>>>>> hive-exec-dependencies,
> >>>>>>>>>> the Oozie build would pull it in automatically without us having
> >> to
> >>>>>>>>>> explicitly declare it.
> >>>>>>>>>>
> >>>>>>>>>> Please let me know what you think.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Dan
> >>>>>>>>>>
> >>>
> >>
> >
>

Re: hive-exec vs. hive-exec:core

Posted by Zoltan Haindrich <ki...@rxd.hu>.

On 11/17/21 7:46 PM, Chao Sun wrote:
>> We have a working hive-exec jar
> 
> I'm not sure about this. The issue comes when the fat hive-exec jar shades
> some jars but doesn't relocate them. In this case there is no way for the
> downstream projects to resolve the conflict.

Exactly - I think those should be hammered out for good; fix the shading/relocation!

> 
> On the Spark side IIUC we had issues with Apache Commons as well as ORC
> (see HIVE-25317 for an effort on this), and there could be more. Spark is
> using Hive 2.3 though but the same applies for master/4.0 if dependency
> versions differ between Hive and the downstream projects.

This change is only about master - it won't change Hive 2.3. HIVE-25317 was for branch-2 as well.
I've seen a few places wierd stuff because they were not able to use the hive-exec jar as-is.
Folks in the Impala project for example went in a direction to re-shade/re-filter the hive-exec jar and relocate some stuff in it - most likely because it conflicted with 
their stuff.
https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml
Taking a quick look at https://github.com/apache/spark/pull/33989/files it seems like you've also done something similar....but instead of using the base artifact; you have 
created a new shader.
I don't think this better than having an artifact which is simply works out-of-the-box.


cheers,
Zoltan

> 
> On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> 
>> On 11/17/21 7:07 PM, Daniel Fritsi wrote:
>>> For Oozie we've decided to use fat Jar downstream (Cloudera) as there we
>> have processes to ensure 3rd-party library versions are kept in sync.
>>>
>>> Since we don't have such a process in Apache, there we'll continue to
>> use the core Jar.
>>
>> It might be possible to evade some problems by using a 3rd party lib
>> syncer - but if we've done a good job shading this stuff; it should not
>> cause any trouble even in case
>> other 3rd party stuff is present....but in any case to check things out
>> you will need a Hive release in some form
>>
>> cheers,
>> Zoltan
>>
>>>
>>> Dan
>>>
>>> On 2021. 11. 17. 18:50, Chao Sun wrote:
>>>>> the idea is to fix the issues they bump into - because people who load
>>>> the jdbc driver may also see those issues.
>>>>
>>>> I don’t get what you mean here, could you elaborate a bit more?
>>>>
>>>> IMO it's a bit premature to do this without a working hive-exec jar for
>>>> downstream projects like Spark/Trino/Presto. At the current state there
>> is
>>>> no way to upgrade these projects to use the fat hive-exec jar.
>>>>
>>>>
>>>>
>>>> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> I wanted to get back to this - but had other things going on.
>>>>>
>>>>> Chao> it is still being used today by some other popular projects
>>>>> the idea is to fix the issues they bump into - because people who load
>> the
>>>>> jdbc driver may also see those issues.
>>>>>
>>>>> Edward> [...] You all must like enjoy shading jars.
>>>>> I totally agree that they may use a shell action as well.
>>>>> I wonder how do you propose to solve issues related to clients using a
>>>>> different version of the guava library?
>>>>>
>>>>> The changes which will remove the core artifact stuff is ready:
>>>>> https://github.com/apache/hive/pull/2648
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
>>>>>> recommendation from the Hive team is to use the hive-exec.jar
>> artifact.
>>>>>>
>>>>>> You know about 10 years ago. I mentioned that oozie should just use
>>>>>> hive-service or hive jdbc. After a big fight where folks kept
>> bringing up
>>>>>> concurrency bugs in hive-server-1 my prs were rejected (even though
>> hive
>>>>>> server2 would not have these bugs). I still cannot fathom why someone
>>>>> using
>>>>>> oozie would want a fat jar of hive (as opposed to hive server or
>>>>> hivejdbc)
>>>>>> . If I had to do that, i would just use shell action..... You all must
>>>>> like
>>>>>> enjoy shading jars.
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<su...@apache.org>  wrote:
>>>>>>
>>>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
>>>>>>> completely - it is still being used today by some other popular
>> projects
>>>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
>>>>> gives
>>>>>>> more flexibility to the other projects to shade & relocate those
>> classes
>>>>>>> according to their need, without waiting for new Hive releases. Hive
>>>>> also
>>>>>>> needs to make sure it relocate everything properly. Otherwise, if
>> some
>>>>>>> classes are shaded & included in `hive-exec` but not relocated, there
>>>>> is no
>>>>>>> way for the other projects to exclude them and avoid potential
>>>>> conflicts.
>>>>>>> Chao
>>>>>>>
>>>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<ki...@rxd.hu>
>> wrote:
>>>>>>>
>>>>>>>> Hey
>>>>>>>>
>>>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
>>>>>>>>> Indeed this may lead to binary incompatibility problems as the one
>> you
>>>>>>>>> mentioned. If I understood correctly the problem you cite comes up
>> if
>>>>>>>>> library B in this case is not relocated. If Hive systematically
>>>>>>> relocates
>>>>>>>>> shaded deps do you think there will still be binary incompatibility
>>>>>>>> issues?
>>>>>>>>> If the relocating solution works, I would personally prefer going
>> down
>>>>>>>> this
>>>>>>>>> path instead of introducing an entirely new module just for the
>> sake
>>>>> of
>>>>>>>>> dependency management. Most of the time when there are problems
>> with
>>>>>>>>> shading the answer comes from relocating the problematic
>> dependencies
>>>>>>> and
>>>>>>>>> people are more or less accustomed with this route.
>>>>>>>> I totally agree with you Stamatis - with the addition that we should
>>>>> work
>>>>>>>> together with the owners of other projects to help them use the
>> correct
>>>>>>>> artifact to gain access to
>>>>>>>> Hive's internal parts.
>>>>>>>> I've opened HIVE-25531 to remove the core classified artifact - and
>>>>>>> ensure
>>>>>>>> that we will be uncovering and fixing future issues with the
>> hive-exec
>>>>>>>> artifact.
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stamatis
>>>>>>>>>
>>>>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
>>>>>>>> <fd...@cloudera.com.invalid>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Dear Hive developers,
>>>>>>>>>>
>>>>>>>>>> I am Dan from the Oozie team and I would like to bring up the
>>>>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
>>>>>>>>>> The reason for that is because as far as we understand the
>> official
>>>>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
>>>>>>> artifact.
>>>>>>>>>> However in Oozie that can end-up in a binary incompatibility.
>>>>>>>>>>
>>>>>>>>>> The reason for that is:
>>>>>>>>>>
>>>>>>>>>>       * Let's say library A is included in the fat Jar.
>>>>>>>>>>
>>>>>>>>>>       * And library B which is using library A is also included in
>> the
>>>>>>> fat
>>>>>>>> Jar.
>>>>>>>>>>       * Let's also say that library A's com.library.alib package is
>>>>>>>>>>         relocated to org.apache.hive.com.library.alib,
>>>>>>>>>>         meaning the com.library.alib.SomeClass becomes
>>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass
>>>>>>>>>>
>>>>>>>>>>       * So if B has a method like public void
>>>>>>>>>>         someMethod(com.library.alib.SomeClass) then the signature
>> of
>>>>> this
>>>>>>>>>>         method will be changed to:
>>>>>>>>>>         public void
>>>>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
>>>>>>>>>>       * If Oozie is also using B directly meaning we'll have b.jar
>> on
>>>>> our
>>>>>>>>>>         classpath, but with the unchanged signature,
>>>>>>>>>>         so when hive-exec tries to invoke someMethod then
>> depending on
>>>>>>>>>>         whether b.jar coming from us will be loaded first or
>> hive-exec
>>>>>>>> will,
>>>>>>>>>>         we can end-up with a NoSuchMethodError is hive-exec tries
>> to
>>>>> pass
>>>>>>>> an
>>>>>>>>>>         org.apache.hive.com.library.alib.SomeClass instance to the
>>>>>>>>>>         someMethod which was loaded from the original b.jar.
>>>>>>>>>>
>>>>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
>>>>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision
>> was
>>>>>>>>>> made to use the hive-exec-core Jar.
>>>>>>>>>>
>>>>>>>>>> Now since the shading process actually removes those dependencies
>>>>> from
>>>>>>>>>> the hive-exec pom which are included in the fat Jar, we manually
>> had
>>>>>>> to
>>>>>>>>>> add some dependencies to Oozie to compensate this.
>>>>>>>>>> However these dependencies are not used by Oozie directly and with
>>>>> the
>>>>>>>>>> growing features of hive-exec we had to repeat the same process
>>>>>>>>>> over-and-over which is a bit unmaintainable.
>>>>>>>>>>
>>>>>>>>>> Today I'm writing to you to propose a long-term solution where
>>>>>>> basically
>>>>>>>>>> nothing would change in the generated hive artifacts, poms and the
>>>>>>> same
>>>>>>>>>> time we wouldn't have to manually declare dependencies in Oozie
>> which
>>>>>>>>>> are not explicitly used by us.
>>>>>>>>>>
>>>>>>>>>> The solution:
>>>>>>>>>>
>>>>>>>>>>      1. We would create a new module named hive-exec-dependencies
>> which
>>>>>>>>>>         would be a pom-packaging module without any Java source
>> files.
>>>>>>>>>>      2. All the dependencies declared in hive-exec would be moved
>> to
>>>>>>>>>>         hive-exec-dependencies.
>>>>>>>>>>      3. We would make the hive-exec-dependencies module the parent
>> of
>>>>>>>>>>         hive-exec and with this hive-exec would still have access
>> to
>>>>> the
>>>>>>>>>>         same dependencies as before.
>>>>>>>>>>      4. The maven shade plugin would still strip the dependencies
>> from
>>>>>>> the
>>>>>>>>>>         generated hive-exec pom which are included in the fat Jar.
>>>>>>>>>>      5. And with a small maven plugin we'd change hive-exec's
>> parent
>>>>> back
>>>>>>>>>>         from hive-exec-dependencies to the root hive project in the
>>>>>>>>>>         generated hive-exec pom file.
>>>>>>>>>>
>>>>>>>>>> I have a change ready locally and it works as described above.
>>>>>>>>>>
>>>>>>>>>> With this on the Oozie side we could add a dependency on
>>>>>>>>>> hive-exec-dependencies and hence all the required libraries which
>> are
>>>>>>>>>> included in the fat Jar would be pulled into Oozie.
>>>>>>>>>> The next time a new dependency would be added to
>>>>>>> hive-exec-dependencies,
>>>>>>>>>> the Oozie build would pull it in automatically without us having
>> to
>>>>>>>>>> explicitly declare it.
>>>>>>>>>>
>>>>>>>>>> Please let me know what you think.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Dan
>>>>>>>>>>
>>>
>>
> 

Re: hive-exec vs. hive-exec:core

Posted by Chao Sun <su...@apache.org>.
> We have a working hive-exec jar

I'm not sure about this. The issue comes when the fat hive-exec jar shades
some jars but doesn't relocate them. In this case there is no way for the
downstream projects to resolve the conflict.

On the Spark side IIUC we had issues with Apache Commons as well as ORC
(see HIVE-25317 for an effort on this), and there could be more. Spark is
using Hive 2.3 though but the same applies for master/4.0 if dependency
versions differ between Hive and the downstream projects.

On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> On 11/17/21 7:07 PM, Daniel Fritsi wrote:
> > For Oozie we've decided to use fat Jar downstream (Cloudera) as there we
> have processes to ensure 3rd-party library versions are kept in sync.
> >
> > Since we don't have such a process in Apache, there we'll continue to
> use the core Jar.
>
> It might be possible to evade some problems by using a 3rd party lib
> syncer - but if we've done a good job shading this stuff; it should not
> cause any trouble even in case
> other 3rd party stuff is present....but in any case to check things out
> you will need a Hive release in some form
>
> cheers,
> Zoltan
>
> >
> > Dan
> >
> > On 2021. 11. 17. 18:50, Chao Sun wrote:
> >>> the idea is to fix the issues they bump into - because people who load
> >> the jdbc driver may also see those issues.
> >>
> >> I don’t get what you mean here, could you elaborate a bit more?
> >>
> >> IMO it's a bit premature to do this without a working hive-exec jar for
> >> downstream projects like Spark/Trino/Presto. At the current state there
> is
> >> no way to upgrade these projects to use the fat hive-exec jar.
> >>
> >>
> >>
> >> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
> >>
> >>> Hey all,
> >>>
> >>> I wanted to get back to this - but had other things going on.
> >>>
> >>> Chao> it is still being used today by some other popular projects
> >>> the idea is to fix the issues they bump into - because people who load
> the
> >>> jdbc driver may also see those issues.
> >>>
> >>> Edward> [...] You all must like enjoy shading jars.
> >>> I totally agree that they may use a shell action as well.
> >>> I wonder how do you propose to solve issues related to clients using a
> >>> different version of the guava library?
> >>>
> >>> The changes which will remove the core artifact stuff is ready:
> >>> https://github.com/apache/hive/pull/2648
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> >>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> >>>> recommendation from the Hive team is to use the hive-exec.jar
> artifact.
> >>>>
> >>>> You know about 10 years ago. I mentioned that oozie should just use
> >>>> hive-service or hive jdbc. After a big fight where folks kept
> bringing up
> >>>> concurrency bugs in hive-server-1 my prs were rejected (even though
> hive
> >>>> server2 would not have these bugs). I still cannot fathom why someone
> >>> using
> >>>> oozie would want a fat jar of hive (as opposed to hive server or
> >>> hivejdbc)
> >>>> . If I had to do that, i would just use shell action..... You all must
> >>> like
> >>>> enjoy shading jars.
> >>>>
> >>>> Edward
> >>>>
> >>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<su...@apache.org>  wrote:
> >>>>
> >>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
> >>>>> completely - it is still being used today by some other popular
> projects
> >>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
> >>> gives
> >>>>> more flexibility to the other projects to shade & relocate those
> classes
> >>>>> according to their need, without waiting for new Hive releases. Hive
> >>> also
> >>>>> needs to make sure it relocate everything properly. Otherwise, if
> some
> >>>>> classes are shaded & included in `hive-exec` but not relocated, there
> >>> is no
> >>>>> way for the other projects to exclude them and avoid potential
> >>> conflicts.
> >>>>> Chao
> >>>>>
> >>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<ki...@rxd.hu>
> wrote:
> >>>>>
> >>>>>> Hey
> >>>>>>
> >>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> >>>>>>> Indeed this may lead to binary incompatibility problems as the one
> you
> >>>>>>> mentioned. If I understood correctly the problem you cite comes up
> if
> >>>>>>> library B in this case is not relocated. If Hive systematically
> >>>>> relocates
> >>>>>>> shaded deps do you think there will still be binary incompatibility
> >>>>>> issues?
> >>>>>>> If the relocating solution works, I would personally prefer going
> down
> >>>>>> this
> >>>>>>> path instead of introducing an entirely new module just for the
> sake
> >>> of
> >>>>>>> dependency management. Most of the time when there are problems
> with
> >>>>>>> shading the answer comes from relocating the problematic
> dependencies
> >>>>> and
> >>>>>>> people are more or less accustomed with this route.
> >>>>>> I totally agree with you Stamatis - with the addition that we should
> >>> work
> >>>>>> together with the owners of other projects to help them use the
> correct
> >>>>>> artifact to gain access to
> >>>>>> Hive's internal parts.
> >>>>>> I've opened HIVE-25531 to remove the core classified artifact - and
> >>>>> ensure
> >>>>>> that we will be uncovering and fixing future issues with the
> hive-exec
> >>>>>> artifact.
> >>>>>>
> >>>>>> cheers,
> >>>>>> Zoltan
> >>>>>>
> >>>>>>
> >>>>>>> Best,
> >>>>>>> Stamatis
> >>>>>>>
> >>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> >>>>>> <fd...@cloudera.com.invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Dear Hive developers,
> >>>>>>>>
> >>>>>>>> I am Dan from the Oozie team and I would like to bring up the
> >>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
> >>>>>>>> The reason for that is because as far as we understand the
> official
> >>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
> >>>>> artifact.
> >>>>>>>> However in Oozie that can end-up in a binary incompatibility.
> >>>>>>>>
> >>>>>>>> The reason for that is:
> >>>>>>>>
> >>>>>>>>      * Let's say library A is included in the fat Jar.
> >>>>>>>>
> >>>>>>>>      * And library B which is using library A is also included in
> the
> >>>>> fat
> >>>>>> Jar.
> >>>>>>>>      * Let's also say that library A's com.library.alib package is
> >>>>>>>>        relocated to org.apache.hive.com.library.alib,
> >>>>>>>>        meaning the com.library.alib.SomeClass becomes
> >>>>>>>>        org.apache.hive.com.library.alib.SomeClass
> >>>>>>>>
> >>>>>>>>      * So if B has a method like public void
> >>>>>>>>        someMethod(com.library.alib.SomeClass) then the signature
> of
> >>> this
> >>>>>>>>        method will be changed to:
> >>>>>>>>        public void
> >>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
> >>>>>>>>      * If Oozie is also using B directly meaning we'll have b.jar
> on
> >>> our
> >>>>>>>>        classpath, but with the unchanged signature,
> >>>>>>>>        so when hive-exec tries to invoke someMethod then
> depending on
> >>>>>>>>        whether b.jar coming from us will be loaded first or
> hive-exec
> >>>>>> will,
> >>>>>>>>        we can end-up with a NoSuchMethodError is hive-exec tries
> to
> >>> pass
> >>>>>> an
> >>>>>>>>        org.apache.hive.com.library.alib.SomeClass instance to the
> >>>>>>>>        someMethod which was loaded from the original b.jar.
> >>>>>>>>
> >>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
> >>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision
> was
> >>>>>>>> made to use the hive-exec-core Jar.
> >>>>>>>>
> >>>>>>>> Now since the shading process actually removes those dependencies
> >>> from
> >>>>>>>> the hive-exec pom which are included in the fat Jar, we manually
> had
> >>>>> to
> >>>>>>>> add some dependencies to Oozie to compensate this.
> >>>>>>>> However these dependencies are not used by Oozie directly and with
> >>> the
> >>>>>>>> growing features of hive-exec we had to repeat the same process
> >>>>>>>> over-and-over which is a bit unmaintainable.
> >>>>>>>>
> >>>>>>>> Today I'm writing to you to propose a long-term solution where
> >>>>> basically
> >>>>>>>> nothing would change in the generated hive artifacts, poms and the
> >>>>> same
> >>>>>>>> time we wouldn't have to manually declare dependencies in Oozie
> which
> >>>>>>>> are not explicitly used by us.
> >>>>>>>>
> >>>>>>>> The solution:
> >>>>>>>>
> >>>>>>>>     1. We would create a new module named hive-exec-dependencies
> which
> >>>>>>>>        would be a pom-packaging module without any Java source
> files.
> >>>>>>>>     2. All the dependencies declared in hive-exec would be moved
> to
> >>>>>>>>        hive-exec-dependencies.
> >>>>>>>>     3. We would make the hive-exec-dependencies module the parent
> of
> >>>>>>>>        hive-exec and with this hive-exec would still have access
> to
> >>> the
> >>>>>>>>        same dependencies as before.
> >>>>>>>>     4. The maven shade plugin would still strip the dependencies
> from
> >>>>> the
> >>>>>>>>        generated hive-exec pom which are included in the fat Jar.
> >>>>>>>>     5. And with a small maven plugin we'd change hive-exec's
> parent
> >>> back
> >>>>>>>>        from hive-exec-dependencies to the root hive project in the
> >>>>>>>>        generated hive-exec pom file.
> >>>>>>>>
> >>>>>>>> I have a change ready locally and it works as described above.
> >>>>>>>>
> >>>>>>>> With this on the Oozie side we could add a dependency on
> >>>>>>>> hive-exec-dependencies and hence all the required libraries which
> are
> >>>>>>>> included in the fat Jar would be pulled into Oozie.
> >>>>>>>> The next time a new dependency would be added to
> >>>>> hive-exec-dependencies,
> >>>>>>>> the Oozie build would pull it in automatically without us having
> to
> >>>>>>>> explicitly declare it.
> >>>>>>>>
> >>>>>>>> Please let me know what you think.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Dan
> >>>>>>>>
> >
>

Re: hive-exec vs. hive-exec:core

Posted by Zoltan Haindrich <ki...@rxd.hu>.
On 11/17/21 7:07 PM, Daniel Fritsi wrote:
> For Oozie we've decided to use fat Jar downstream (Cloudera) as there we have processes to ensure 3rd-party library versions are kept in sync.
> 
> Since we don't have such a process in Apache, there we'll continue to use the core Jar.

It might be possible to evade some problems by using a 3rd party lib syncer - but if we've done a good job shading this stuff; it should not cause any trouble even in case 
other 3rd party stuff is present....but in any case to check things out you will need a Hive release in some form

cheers,
Zoltan

> 
> Dan
> 
> On 2021. 11. 17. 18:50, Chao Sun wrote:
>>> the idea is to fix the issues they bump into - because people who load
>> the jdbc driver may also see those issues.
>>
>> I don’t get what you mean here, could you elaborate a bit more?
>>
>> IMO it's a bit premature to do this without a working hive-exec jar for
>> downstream projects like Spark/Trino/Presto. At the current state there is
>> no way to upgrade these projects to use the fat hive-exec jar.
>>
>>
>>
>> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
>>
>>> Hey all,
>>>
>>> I wanted to get back to this - but had other things going on.
>>>
>>> Chao> it is still being used today by some other popular projects
>>> the idea is to fix the issues they bump into - because people who load the
>>> jdbc driver may also see those issues.
>>>
>>> Edward> [...] You all must like enjoy shading jars.
>>> I totally agree that they may use a shell action as well.
>>> I wonder how do you propose to solve issues related to clients using a
>>> different version of the guava library?
>>>
>>> The changes which will remove the core artifact stuff is ready:
>>> https://github.com/apache/hive/pull/2648
>>>
>>> cheers,
>>> Zoltan
>>>
>>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
>>>> recommendation from the Hive team is to use the hive-exec.jar artifact.
>>>>
>>>> You know about 10 years ago. I mentioned that oozie should just use
>>>> hive-service or hive jdbc. After a big fight where folks kept bringing up
>>>> concurrency bugs in hive-server-1 my prs were rejected (even though hive
>>>> server2 would not have these bugs). I still cannot fathom why someone
>>> using
>>>> oozie would want a fat jar of hive (as opposed to hive server or
>>> hivejdbc)
>>>> . If I had to do that, i would just use shell action..... You all must
>>> like
>>>> enjoy shading jars.
>>>>
>>>> Edward
>>>>
>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<su...@apache.org>  wrote:
>>>>
>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
>>>>> completely - it is still being used today by some other popular projects
>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
>>> gives
>>>>> more flexibility to the other projects to shade & relocate those classes
>>>>> according to their need, without waiting for new Hive releases. Hive
>>> also
>>>>> needs to make sure it relocate everything properly. Otherwise, if some
>>>>> classes are shaded & included in `hive-exec` but not relocated, there
>>> is no
>>>>> way for the other projects to exclude them and avoid potential
>>> conflicts.
>>>>> Chao
>>>>>
>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
>>>>>
>>>>>> Hey
>>>>>>
>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
>>>>>>> Indeed this may lead to binary incompatibility problems as the one you
>>>>>>> mentioned. If I understood correctly the problem you cite comes up if
>>>>>>> library B in this case is not relocated. If Hive systematically
>>>>> relocates
>>>>>>> shaded deps do you think there will still be binary incompatibility
>>>>>> issues?
>>>>>>> If the relocating solution works, I would personally prefer going down
>>>>>> this
>>>>>>> path instead of introducing an entirely new module just for the sake
>>> of
>>>>>>> dependency management. Most of the time when there are problems with
>>>>>>> shading the answer comes from relocating the problematic dependencies
>>>>> and
>>>>>>> people are more or less accustomed with this route.
>>>>>> I totally agree with you Stamatis - with the addition that we should
>>> work
>>>>>> together with the owners of other projects to help them use the correct
>>>>>> artifact to gain access to
>>>>>> Hive's internal parts.
>>>>>> I've opened HIVE-25531 to remove the core classified artifact - and
>>>>> ensure
>>>>>> that we will be uncovering and fixing future issues with the hive-exec
>>>>>> artifact.
>>>>>>
>>>>>> cheers,
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>> Stamatis
>>>>>>>
>>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
>>>>>> <fd...@cloudera.com.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear Hive developers,
>>>>>>>>
>>>>>>>> I am Dan from the Oozie team and I would like to bring up the
>>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
>>>>>>>> The reason for that is because as far as we understand the official
>>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
>>>>> artifact.
>>>>>>>> However in Oozie that can end-up in a binary incompatibility.
>>>>>>>>
>>>>>>>> The reason for that is:
>>>>>>>>
>>>>>>>>      * Let's say library A is included in the fat Jar.
>>>>>>>>
>>>>>>>>      * And library B which is using library A is also included in the
>>>>> fat
>>>>>> Jar.
>>>>>>>>      * Let's also say that library A's com.library.alib package is
>>>>>>>>        relocated to org.apache.hive.com.library.alib,
>>>>>>>>        meaning the com.library.alib.SomeClass becomes
>>>>>>>>        org.apache.hive.com.library.alib.SomeClass
>>>>>>>>
>>>>>>>>      * So if B has a method like public void
>>>>>>>>        someMethod(com.library.alib.SomeClass) then the signature of
>>> this
>>>>>>>>        method will be changed to:
>>>>>>>>        public void
>>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
>>>>>>>>      * If Oozie is also using B directly meaning we'll have b.jar on
>>> our
>>>>>>>>        classpath, but with the unchanged signature,
>>>>>>>>        so when hive-exec tries to invoke someMethod then depending on
>>>>>>>>        whether b.jar coming from us will be loaded first or hive-exec
>>>>>> will,
>>>>>>>>        we can end-up with a NoSuchMethodError is hive-exec tries to
>>> pass
>>>>>> an
>>>>>>>>        org.apache.hive.com.library.alib.SomeClass instance to the
>>>>>>>>        someMethod which was loaded from the original b.jar.
>>>>>>>>
>>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
>>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
>>>>>>>> made to use the hive-exec-core Jar.
>>>>>>>>
>>>>>>>> Now since the shading process actually removes those dependencies
>>> from
>>>>>>>> the hive-exec pom which are included in the fat Jar, we manually had
>>>>> to
>>>>>>>> add some dependencies to Oozie to compensate this.
>>>>>>>> However these dependencies are not used by Oozie directly and with
>>> the
>>>>>>>> growing features of hive-exec we had to repeat the same process
>>>>>>>> over-and-over which is a bit unmaintainable.
>>>>>>>>
>>>>>>>> Today I'm writing to you to propose a long-term solution where
>>>>> basically
>>>>>>>> nothing would change in the generated hive artifacts, poms and the
>>>>> same
>>>>>>>> time we wouldn't have to manually declare dependencies in Oozie which
>>>>>>>> are not explicitly used by us.
>>>>>>>>
>>>>>>>> The solution:
>>>>>>>>
>>>>>>>>     1. We would create a new module named hive-exec-dependencies which
>>>>>>>>        would be a pom-packaging module without any Java source files.
>>>>>>>>     2. All the dependencies declared in hive-exec would be moved to
>>>>>>>>        hive-exec-dependencies.
>>>>>>>>     3. We would make the hive-exec-dependencies module the parent of
>>>>>>>>        hive-exec and with this hive-exec would still have access to
>>> the
>>>>>>>>        same dependencies as before.
>>>>>>>>     4. The maven shade plugin would still strip the dependencies from
>>>>> the
>>>>>>>>        generated hive-exec pom which are included in the fat Jar.
>>>>>>>>     5. And with a small maven plugin we'd change hive-exec's parent
>>> back
>>>>>>>>        from hive-exec-dependencies to the root hive project in the
>>>>>>>>        generated hive-exec pom file.
>>>>>>>>
>>>>>>>> I have a change ready locally and it works as described above.
>>>>>>>>
>>>>>>>> With this on the Oozie side we could add a dependency on
>>>>>>>> hive-exec-dependencies and hence all the required libraries which are
>>>>>>>> included in the fat Jar would be pulled into Oozie.
>>>>>>>> The next time a new dependency would be added to
>>>>> hive-exec-dependencies,
>>>>>>>> the Oozie build would pull it in automatically without us having to
>>>>>>>> explicitly declare it.
>>>>>>>>
>>>>>>>> Please let me know what you think.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Dan
>>>>>>>>
> 

Re: hive-exec vs. hive-exec:core

Posted by Daniel Fritsi <fd...@cloudera.com.INVALID>.
For Oozie we've decided to use fat Jar downstream (Cloudera) as there we 
have processes to ensure 3rd-party library versions are kept in sync.

Since we don't have such a process in Apache, there we'll continue to 
use the core Jar.

Dan

On 2021. 11. 17. 18:50, Chao Sun wrote:
>> the idea is to fix the issues they bump into - because people who load
> the jdbc driver may also see those issues.
>
> I don’t get what you mean here, could you elaborate a bit more?
>
> IMO it's a bit premature to do this without a working hive-exec jar for
> downstream projects like Spark/Trino/Presto. At the current state there is
> no way to upgrade these projects to use the fat hive-exec jar.
>
>
>
> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
>
>> Hey all,
>>
>> I wanted to get back to this - but had other things going on.
>>
>> Chao> it is still being used today by some other popular projects
>> the idea is to fix the issues they bump into - because people who load the
>> jdbc driver may also see those issues.
>>
>> Edward> [...] You all must like enjoy shading jars.
>> I totally agree that they may use a shell action as well.
>> I wonder how do you propose to solve issues related to clients using a
>> different version of the guava library?
>>
>> The changes which will remove the core artifact stuff is ready:
>> https://github.com/apache/hive/pull/2648
>>
>> cheers,
>> Zoltan
>>
>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
>>> recommendation from the Hive team is to use the hive-exec.jar artifact.
>>>
>>> You know about 10 years ago. I mentioned that oozie should just use
>>> hive-service or hive jdbc. After a big fight where folks kept bringing up
>>> concurrency bugs in hive-server-1 my prs were rejected (even though hive
>>> server2 would not have these bugs). I still cannot fathom why someone
>> using
>>> oozie would want a fat jar of hive (as opposed to hive server or
>> hivejdbc)
>>> . If I had to do that, i would just use shell action..... You all must
>> like
>>> enjoy shading jars.
>>>
>>> Edward
>>>
>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun<su...@apache.org>  wrote:
>>>
>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
>>>> completely - it is still being used today by some other popular projects
>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
>> gives
>>>> more flexibility to the other projects to shade & relocate those classes
>>>> according to their need, without waiting for new Hive releases. Hive
>> also
>>>> needs to make sure it relocate everything properly. Otherwise, if some
>>>> classes are shaded & included in `hive-exec` but not relocated, there
>> is no
>>>> way for the other projects to exclude them and avoid potential
>> conflicts.
>>>> Chao
>>>>
>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich<ki...@rxd.hu>  wrote:
>>>>
>>>>> Hey
>>>>>
>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
>>>>>> Indeed this may lead to binary incompatibility problems as the one you
>>>>>> mentioned. If I understood correctly the problem you cite comes up if
>>>>>> library B in this case is not relocated. If Hive systematically
>>>> relocates
>>>>>> shaded deps do you think there will still be binary incompatibility
>>>>> issues?
>>>>>> If the relocating solution works, I would personally prefer going down
>>>>> this
>>>>>> path instead of introducing an entirely new module just for the sake
>> of
>>>>>> dependency management. Most of the time when there are problems with
>>>>>> shading the answer comes from relocating the problematic dependencies
>>>> and
>>>>>> people are more or less accustomed with this route.
>>>>> I totally agree with you Stamatis - with the addition that we should
>> work
>>>>> together with the owners of other projects to help them use the correct
>>>>> artifact to gain access to
>>>>> Hive's internal parts.
>>>>> I've opened HIVE-25531 to remove the core classified artifact - and
>>>> ensure
>>>>> that we will be uncovering and fixing future issues with the hive-exec
>>>>> artifact.
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>>
>>>>>> Best,
>>>>>> Stamatis
>>>>>>
>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
>>>>> <fd...@cloudera.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Hive developers,
>>>>>>>
>>>>>>> I am Dan from the Oozie team and I would like to bring up the
>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
>>>>>>> The reason for that is because as far as we understand the official
>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
>>>> artifact.
>>>>>>> However in Oozie that can end-up in a binary incompatibility.
>>>>>>>
>>>>>>> The reason for that is:
>>>>>>>
>>>>>>>      * Let's say library A is included in the fat Jar.
>>>>>>>
>>>>>>>      * And library B which is using library A is also included in the
>>>> fat
>>>>> Jar.
>>>>>>>      * Let's also say that library A's com.library.alib package is
>>>>>>>        relocated to org.apache.hive.com.library.alib,
>>>>>>>        meaning the com.library.alib.SomeClass becomes
>>>>>>>        org.apache.hive.com.library.alib.SomeClass
>>>>>>>
>>>>>>>      * So if B has a method like public void
>>>>>>>        someMethod(com.library.alib.SomeClass) then the signature of
>> this
>>>>>>>        method will be changed to:
>>>>>>>        public void
>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
>>>>>>>      * If Oozie is also using B directly meaning we'll have b.jar on
>> our
>>>>>>>        classpath, but with the unchanged signature,
>>>>>>>        so when hive-exec tries to invoke someMethod then depending on
>>>>>>>        whether b.jar coming from us will be loaded first or hive-exec
>>>>> will,
>>>>>>>        we can end-up with a NoSuchMethodError is hive-exec tries to
>> pass
>>>>> an
>>>>>>>        org.apache.hive.com.library.alib.SomeClass instance to the
>>>>>>>        someMethod which was loaded from the original b.jar.
>>>>>>>
>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
>>>>>>> made to use the hive-exec-core Jar.
>>>>>>>
>>>>>>> Now since the shading process actually removes those dependencies
>> from
>>>>>>> the hive-exec pom which are included in the fat Jar, we manually had
>>>> to
>>>>>>> add some dependencies to Oozie to compensate this.
>>>>>>> However these dependencies are not used by Oozie directly and with
>> the
>>>>>>> growing features of hive-exec we had to repeat the same process
>>>>>>> over-and-over which is a bit unmaintainable.
>>>>>>>
>>>>>>> Today I'm writing to you to propose a long-term solution where
>>>> basically
>>>>>>> nothing would change in the generated hive artifacts, poms and the
>>>> same
>>>>>>> time we wouldn't have to manually declare dependencies in Oozie which
>>>>>>> are not explicitly used by us.
>>>>>>>
>>>>>>> The solution:
>>>>>>>
>>>>>>>     1. We would create a new module named hive-exec-dependencies which
>>>>>>>        would be a pom-packaging module without any Java source files.
>>>>>>>     2. All the dependencies declared in hive-exec would be moved to
>>>>>>>        hive-exec-dependencies.
>>>>>>>     3. We would make the hive-exec-dependencies module the parent of
>>>>>>>        hive-exec and with this hive-exec would still have access to
>> the
>>>>>>>        same dependencies as before.
>>>>>>>     4. The maven shade plugin would still strip the dependencies from
>>>> the
>>>>>>>        generated hive-exec pom which are included in the fat Jar.
>>>>>>>     5. And with a small maven plugin we'd change hive-exec's parent
>> back
>>>>>>>        from hive-exec-dependencies to the root hive project in the
>>>>>>>        generated hive-exec pom file.
>>>>>>>
>>>>>>> I have a change ready locally and it works as described above.
>>>>>>>
>>>>>>> With this on the Oozie side we could add a dependency on
>>>>>>> hive-exec-dependencies and hence all the required libraries which are
>>>>>>> included in the fat Jar would be pulled into Oozie.
>>>>>>> The next time a new dependency would be added to
>>>> hive-exec-dependencies,
>>>>>>> the Oozie build would pull it in automatically without us having to
>>>>>>> explicitly declare it.
>>>>>>>
>>>>>>> Please let me know what you think.
>>>>>>>
>>>>>>> Best,
>>>>>>> Dan
>>>>>>>

Re: hive-exec vs. hive-exec:core

Posted by Zoltan Haindrich <ki...@rxd.hu>.

On 11/17/21 6:50 PM, Chao Sun wrote:
>> the idea is to fix the issues they bump into - because people who load
> the jdbc driver may also see those issues.
> 
> I don’t get what you mean here, could you elaborate a bit more?

I suggest to work with the downstream projects people and smash out issues - if there is any.
I'll be here and open to help with that.

> IMO it's a bit premature to do this without a working hive-exec jar for
> downstream projects like Spark/Trino/Presto. At the current state there is
> no way to upgrade these projects to use the fat hive-exec jar.

We have a working hive-exec jar - most of the problems was caused by:
* the incorrectly shaded guava lib we had in hive-exec with invalid relocation instructions
* similarily incorrectly shaded jackson 1.x
these issues are fixed on master - but since it was never released downstream projects have not yet been able to migrate to it.

I don't think we should keep something which could easily cause problems during usage - so we should remove the core artifact for good.

cheers,
Zoltan

> 
> 
> 
> On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> 
>> Hey all,
>>
>> I wanted to get back to this - but had other things going on.
>>
>> Chao> it is still being used today by some other popular projects
>> the idea is to fix the issues they bump into - because people who load the
>> jdbc driver may also see those issues.
>>
>> Edward> [...] You all must like enjoy shading jars.
>> I totally agree that they may use a shell action as well.
>> I wonder how do you propose to solve issues related to clients using a
>> different version of the guava library?
>>
>> The changes which will remove the core artifact stuff is ready:
>> https://github.com/apache/hive/pull/2648
>>
>> cheers,
>> Zoltan
>>
>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
>>> recommendation from the Hive team is to use the hive-exec.jar artifact.
>>>
>>> You know about 10 years ago. I mentioned that oozie should just use
>>> hive-service or hive jdbc. After a big fight where folks kept bringing up
>>> concurrency bugs in hive-server-1 my prs were rejected (even though hive
>>> server2 would not have these bugs). I still cannot fathom why someone
>> using
>>> oozie would want a fat jar of hive (as opposed to hive server or
>> hivejdbc)
>>> . If I had to do that, i would just use shell action..... You all must
>> like
>>> enjoy shading jars.
>>>
>>> Edward
>>>
>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <su...@apache.org> wrote:
>>>
>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
>>>> completely - it is still being used today by some other popular projects
>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
>> gives
>>>> more flexibility to the other projects to shade & relocate those classes
>>>> according to their need, without waiting for new Hive releases. Hive
>> also
>>>> needs to make sure it relocate everything properly. Otherwise, if some
>>>> classes are shaded & included in `hive-exec` but not relocated, there
>> is no
>>>> way for the other projects to exclude them and avoid potential
>> conflicts.
>>>>
>>>> Chao
>>>>
>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>>>
>>>>> Hey
>>>>>
>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
>>>>>> Indeed this may lead to binary incompatibility problems as the one you
>>>>>> mentioned. If I understood correctly the problem you cite comes up if
>>>>>> library B in this case is not relocated. If Hive systematically
>>>> relocates
>>>>>> shaded deps do you think there will still be binary incompatibility
>>>>> issues?
>>>>>>
>>>>>> If the relocating solution works, I would personally prefer going down
>>>>> this
>>>>>> path instead of introducing an entirely new module just for the sake
>> of
>>>>>> dependency management. Most of the time when there are problems with
>>>>>> shading the answer comes from relocating the problematic dependencies
>>>> and
>>>>>> people are more or less accustomed with this route.
>>>>>
>>>>> I totally agree with you Stamatis - with the addition that we should
>> work
>>>>> together with the owners of other projects to help them use the correct
>>>>> artifact to gain access to
>>>>> Hive's internal parts.
>>>>> I've opened HIVE-25531 to remove the core classified artifact - and
>>>> ensure
>>>>> that we will be uncovering and fixing future issues with the hive-exec
>>>>> artifact.
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Stamatis
>>>>>>
>>>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
>>>>> <fd...@cloudera.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Hive developers,
>>>>>>>
>>>>>>> I am Dan from the Oozie team and I would like to bring up the
>>>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
>>>>>>> The reason for that is because as far as we understand the official
>>>>>>> recommendation from the Hive team is to use the hive-exec.jar
>>>> artifact.
>>>>>>>
>>>>>>> However in Oozie that can end-up in a binary incompatibility.
>>>>>>>
>>>>>>> The reason for that is:
>>>>>>>
>>>>>>>      * Let's say library A is included in the fat Jar.
>>>>>>>
>>>>>>>      * And library B which is using library A is also included in the
>>>> fat
>>>>> Jar.
>>>>>>>
>>>>>>>      * Let's also say that library A's com.library.alib package is
>>>>>>>        relocated to org.apache.hive.com.library.alib,
>>>>>>>        meaning the com.library.alib.SomeClass becomes
>>>>>>>        org.apache.hive.com.library.alib.SomeClass
>>>>>>>
>>>>>>>      * So if B has a method like public void
>>>>>>>        someMethod(com.library.alib.SomeClass) then the signature of
>> this
>>>>>>>        method will be changed to:
>>>>>>>        public void
>>>> someMethod(org.apache.hive.com.library.alib.SomeClass)
>>>>>>>
>>>>>>>      * If Oozie is also using B directly meaning we'll have b.jar on
>> our
>>>>>>>        classpath, but with the unchanged signature,
>>>>>>>        so when hive-exec tries to invoke someMethod then depending on
>>>>>>>        whether b.jar coming from us will be loaded first or hive-exec
>>>>> will,
>>>>>>>        we can end-up with a NoSuchMethodError is hive-exec tries to
>> pass
>>>>> an
>>>>>>>        org.apache.hive.com.library.alib.SomeClass instance to the
>>>>>>>        someMethod which was loaded from the original b.jar.
>>>>>>>
>>>>>>> Hence in Oozie a long time ago (OOZIE-2621
>>>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
>>>>>>> made to use the hive-exec-core Jar.
>>>>>>>
>>>>>>> Now since the shading process actually removes those dependencies
>> from
>>>>>>> the hive-exec pom which are included in the fat Jar, we manually had
>>>> to
>>>>>>> add some dependencies to Oozie to compensate this.
>>>>>>> However these dependencies are not used by Oozie directly and with
>> the
>>>>>>> growing features of hive-exec we had to repeat the same process
>>>>>>> over-and-over which is a bit unmaintainable.
>>>>>>>
>>>>>>> Today I'm writing to you to propose a long-term solution where
>>>> basically
>>>>>>> nothing would change in the generated hive artifacts, poms and the
>>>> same
>>>>>>> time we wouldn't have to manually declare dependencies in Oozie which
>>>>>>> are not explicitly used by us.
>>>>>>>
>>>>>>> The solution:
>>>>>>>
>>>>>>>     1. We would create a new module named hive-exec-dependencies which
>>>>>>>        would be a pom-packaging module without any Java source files.
>>>>>>>     2. All the dependencies declared in hive-exec would be moved to
>>>>>>>        hive-exec-dependencies.
>>>>>>>     3. We would make the hive-exec-dependencies module the parent of
>>>>>>>        hive-exec and with this hive-exec would still have access to
>> the
>>>>>>>        same dependencies as before.
>>>>>>>     4. The maven shade plugin would still strip the dependencies from
>>>> the
>>>>>>>        generated hive-exec pom which are included in the fat Jar.
>>>>>>>     5. And with a small maven plugin we'd change hive-exec's parent
>> back
>>>>>>>        from hive-exec-dependencies to the root hive project in the
>>>>>>>        generated hive-exec pom file.
>>>>>>>
>>>>>>> I have a change ready locally and it works as described above.
>>>>>>>
>>>>>>> With this on the Oozie side we could add a dependency on
>>>>>>> hive-exec-dependencies and hence all the required libraries which are
>>>>>>> included in the fat Jar would be pulled into Oozie.
>>>>>>> The next time a new dependency would be added to
>>>> hive-exec-dependencies,
>>>>>>> the Oozie build would pull it in automatically without us having to
>>>>>>> explicitly declare it.
>>>>>>>
>>>>>>> Please let me know what you think.
>>>>>>>
>>>>>>> Best,
>>>>>>> Dan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: hive-exec vs. hive-exec:core

Posted by Chao Sun <su...@apache.org>.
> the idea is to fix the issues they bump into - because people who load
the jdbc driver may also see those issues.

I don’t get what you mean here, could you elaborate a bit more?

IMO it's a bit premature to do this without a working hive-exec jar for
downstream projects like Spark/Trino/Presto. At the current state there is
no way to upgrade these projects to use the fat hive-exec jar.



On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hey all,
>
> I wanted to get back to this - but had other things going on.
>
> Chao> it is still being used today by some other popular projects
> the idea is to fix the issues they bump into - because people who load the
> jdbc driver may also see those issues.
>
> Edward> [...] You all must like enjoy shading jars.
> I totally agree that they may use a shell action as well.
> I wonder how do you propose to solve issues related to clients using a
> different version of the guava library?
>
> The changes which will remove the core artifact stuff is ready:
> https://github.com/apache/hive/pull/2648
>
> cheers,
> Zoltan
>
> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> > recommendation from the Hive team is to use the hive-exec.jar artifact.
> >
> > You know about 10 years ago. I mentioned that oozie should just use
> > hive-service or hive jdbc. After a big fight where folks kept bringing up
> > concurrency bugs in hive-server-1 my prs were rejected (even though hive
> > server2 would not have these bugs). I still cannot fathom why someone
> using
> > oozie would want a fat jar of hive (as opposed to hive server or
> hivejdbc)
> > . If I had to do that, i would just use shell action..... You all must
> like
> > enjoy shading jars.
> >
> > Edward
> >
> > On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <su...@apache.org> wrote:
> >
> >> I'm not sure whether it is a good idea to remove `hive-exec-core`
> >> completely - it is still being used today by some other popular projects
> >> including Spark and Trino/Presto. By sticking to `hive-exec-core` it
> gives
> >> more flexibility to the other projects to shade & relocate those classes
> >> according to their need, without waiting for new Hive releases. Hive
> also
> >> needs to make sure it relocate everything properly. Otherwise, if some
> >> classes are shaded & included in `hive-exec` but not relocated, there
> is no
> >> way for the other projects to exclude them and avoid potential
> conflicts.
> >>
> >> Chao
> >>
> >> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >>
> >>> Hey
> >>>
> >>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> >>>> Indeed this may lead to binary incompatibility problems as the one you
> >>>> mentioned. If I understood correctly the problem you cite comes up if
> >>>> library B in this case is not relocated. If Hive systematically
> >> relocates
> >>>> shaded deps do you think there will still be binary incompatibility
> >>> issues?
> >>>>
> >>>> If the relocating solution works, I would personally prefer going down
> >>> this
> >>>> path instead of introducing an entirely new module just for the sake
> of
> >>>> dependency management. Most of the time when there are problems with
> >>>> shading the answer comes from relocating the problematic dependencies
> >> and
> >>>> people are more or less accustomed with this route.
> >>>
> >>> I totally agree with you Stamatis - with the addition that we should
> work
> >>> together with the owners of other projects to help them use the correct
> >>> artifact to gain access to
> >>> Hive's internal parts.
> >>> I've opened HIVE-25531 to remove the core classified artifact - and
> >> ensure
> >>> that we will be uncovering and fixing future issues with the hive-exec
> >>> artifact.
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> >>>
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> >>> <fd...@cloudera.com.invalid>
> >>>> wrote:
> >>>>
> >>>>> Dear Hive developers,
> >>>>>
> >>>>> I am Dan from the Oozie team and I would like to bring up the
> >>>>> hive-exec.jar vs. hive-exec-core.jar topic.
> >>>>> The reason for that is because as far as we understand the official
> >>>>> recommendation from the Hive team is to use the hive-exec.jar
> >> artifact.
> >>>>>
> >>>>> However in Oozie that can end-up in a binary incompatibility.
> >>>>>
> >>>>> The reason for that is:
> >>>>>
> >>>>>     * Let's say library A is included in the fat Jar.
> >>>>>
> >>>>>     * And library B which is using library A is also included in the
> >> fat
> >>> Jar.
> >>>>>
> >>>>>     * Let's also say that library A's com.library.alib package is
> >>>>>       relocated to org.apache.hive.com.library.alib,
> >>>>>       meaning the com.library.alib.SomeClass becomes
> >>>>>       org.apache.hive.com.library.alib.SomeClass
> >>>>>
> >>>>>     * So if B has a method like public void
> >>>>>       someMethod(com.library.alib.SomeClass) then the signature of
> this
> >>>>>       method will be changed to:
> >>>>>       public void
> >> someMethod(org.apache.hive.com.library.alib.SomeClass)
> >>>>>
> >>>>>     * If Oozie is also using B directly meaning we'll have b.jar on
> our
> >>>>>       classpath, but with the unchanged signature,
> >>>>>       so when hive-exec tries to invoke someMethod then depending on
> >>>>>       whether b.jar coming from us will be loaded first or hive-exec
> >>> will,
> >>>>>       we can end-up with a NoSuchMethodError is hive-exec tries to
> pass
> >>> an
> >>>>>       org.apache.hive.com.library.alib.SomeClass instance to the
> >>>>>       someMethod which was loaded from the original b.jar.
> >>>>>
> >>>>> Hence in Oozie a long time ago (OOZIE-2621
> >>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> >>>>> made to use the hive-exec-core Jar.
> >>>>>
> >>>>> Now since the shading process actually removes those dependencies
> from
> >>>>> the hive-exec pom which are included in the fat Jar, we manually had
> >> to
> >>>>> add some dependencies to Oozie to compensate this.
> >>>>> However these dependencies are not used by Oozie directly and with
> the
> >>>>> growing features of hive-exec we had to repeat the same process
> >>>>> over-and-over which is a bit unmaintainable.
> >>>>>
> >>>>> Today I'm writing to you to propose a long-term solution where
> >> basically
> >>>>> nothing would change in the generated hive artifacts, poms and the
> >> same
> >>>>> time we wouldn't have to manually declare dependencies in Oozie which
> >>>>> are not explicitly used by us.
> >>>>>
> >>>>> The solution:
> >>>>>
> >>>>>    1. We would create a new module named hive-exec-dependencies which
> >>>>>       would be a pom-packaging module without any Java source files.
> >>>>>    2. All the dependencies declared in hive-exec would be moved to
> >>>>>       hive-exec-dependencies.
> >>>>>    3. We would make the hive-exec-dependencies module the parent of
> >>>>>       hive-exec and with this hive-exec would still have access to
> the
> >>>>>       same dependencies as before.
> >>>>>    4. The maven shade plugin would still strip the dependencies from
> >> the
> >>>>>       generated hive-exec pom which are included in the fat Jar.
> >>>>>    5. And with a small maven plugin we'd change hive-exec's parent
> back
> >>>>>       from hive-exec-dependencies to the root hive project in the
> >>>>>       generated hive-exec pom file.
> >>>>>
> >>>>> I have a change ready locally and it works as described above.
> >>>>>
> >>>>> With this on the Oozie side we could add a dependency on
> >>>>> hive-exec-dependencies and hence all the required libraries which are
> >>>>> included in the fat Jar would be pulled into Oozie.
> >>>>> The next time a new dependency would be added to
> >> hive-exec-dependencies,
> >>>>> the Oozie build would pull it in automatically without us having to
> >>>>> explicitly declare it.
> >>>>>
> >>>>> Please let me know what you think.
> >>>>>
> >>>>> Best,
> >>>>> Dan
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: hive-exec vs. hive-exec:core

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hey all,

I wanted to get back to this - but had other things going on.

Chao> it is still being used today by some other popular projects
the idea is to fix the issues they bump into - because people who load the jdbc driver may also see those issues.

Edward> [...] You all must like enjoy shading jars.
I totally agree that they may use a shell action as well.
I wonder how do you propose to solve issues related to clients using a different version of the guava library?

The changes which will remove the core artifact stuff is ready: https://github.com/apache/hive/pull/2648

cheers,
Zoltan

On 9/21/21 8:23 PM, Edward Capriolo wrote:
> recommendation from the Hive team is to use the hive-exec.jar artifact.
> 
> You know about 10 years ago. I mentioned that oozie should just use
> hive-service or hive jdbc. After a big fight where folks kept bringing up
> concurrency bugs in hive-server-1 my prs were rejected (even though hive
> server2 would not have these bugs). I still cannot fathom why someone using
> oozie would want a fat jar of hive (as opposed to hive server or hivejdbc)
> . If I had to do that, i would just use shell action..... You all must like
> enjoy shading jars.
> 
> Edward
> 
> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <su...@apache.org> wrote:
> 
>> I'm not sure whether it is a good idea to remove `hive-exec-core`
>> completely - it is still being used today by some other popular projects
>> including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives
>> more flexibility to the other projects to shade & relocate those classes
>> according to their need, without waiting for new Hive releases. Hive also
>> needs to make sure it relocate everything properly. Otherwise, if some
>> classes are shaded & included in `hive-exec` but not relocated, there is no
>> way for the other projects to exclude them and avoid potential conflicts.
>>
>> Chao
>>
>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>
>>> Hey
>>>
>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
>>>> Indeed this may lead to binary incompatibility problems as the one you
>>>> mentioned. If I understood correctly the problem you cite comes up if
>>>> library B in this case is not relocated. If Hive systematically
>> relocates
>>>> shaded deps do you think there will still be binary incompatibility
>>> issues?
>>>>
>>>> If the relocating solution works, I would personally prefer going down
>>> this
>>>> path instead of introducing an entirely new module just for the sake of
>>>> dependency management. Most of the time when there are problems with
>>>> shading the answer comes from relocating the problematic dependencies
>> and
>>>> people are more or less accustomed with this route.
>>>
>>> I totally agree with you Stamatis - with the addition that we should work
>>> together with the owners of other projects to help them use the correct
>>> artifact to gain access to
>>> Hive's internal parts.
>>> I've opened HIVE-25531 to remove the core classified artifact - and
>> ensure
>>> that we will be uncovering and fixing future issues with the hive-exec
>>> artifact.
>>>
>>> cheers,
>>> Zoltan
>>>
>>>
>>>>
>>>> Best,
>>>> Stamatis
>>>>
>>>> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
>>> <fd...@cloudera.com.invalid>
>>>> wrote:
>>>>
>>>>> Dear Hive developers,
>>>>>
>>>>> I am Dan from the Oozie team and I would like to bring up the
>>>>> hive-exec.jar vs. hive-exec-core.jar topic.
>>>>> The reason for that is because as far as we understand the official
>>>>> recommendation from the Hive team is to use the hive-exec.jar
>> artifact.
>>>>>
>>>>> However in Oozie that can end-up in a binary incompatibility.
>>>>>
>>>>> The reason for that is:
>>>>>
>>>>>     * Let's say library A is included in the fat Jar.
>>>>>
>>>>>     * And library B which is using library A is also included in the
>> fat
>>> Jar.
>>>>>
>>>>>     * Let's also say that library A's com.library.alib package is
>>>>>       relocated to org.apache.hive.com.library.alib,
>>>>>       meaning the com.library.alib.SomeClass becomes
>>>>>       org.apache.hive.com.library.alib.SomeClass
>>>>>
>>>>>     * So if B has a method like public void
>>>>>       someMethod(com.library.alib.SomeClass) then the signature of this
>>>>>       method will be changed to:
>>>>>       public void
>> someMethod(org.apache.hive.com.library.alib.SomeClass)
>>>>>
>>>>>     * If Oozie is also using B directly meaning we'll have b.jar on our
>>>>>       classpath, but with the unchanged signature,
>>>>>       so when hive-exec tries to invoke someMethod then depending on
>>>>>       whether b.jar coming from us will be loaded first or hive-exec
>>> will,
>>>>>       we can end-up with a NoSuchMethodError is hive-exec tries to pass
>>> an
>>>>>       org.apache.hive.com.library.alib.SomeClass instance to the
>>>>>       someMethod which was loaded from the original b.jar.
>>>>>
>>>>> Hence in Oozie a long time ago (OOZIE-2621
>>>>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
>>>>> made to use the hive-exec-core Jar.
>>>>>
>>>>> Now since the shading process actually removes those dependencies from
>>>>> the hive-exec pom which are included in the fat Jar, we manually had
>> to
>>>>> add some dependencies to Oozie to compensate this.
>>>>> However these dependencies are not used by Oozie directly and with the
>>>>> growing features of hive-exec we had to repeat the same process
>>>>> over-and-over which is a bit unmaintainable.
>>>>>
>>>>> Today I'm writing to you to propose a long-term solution where
>> basically
>>>>> nothing would change in the generated hive artifacts, poms and the
>> same
>>>>> time we wouldn't have to manually declare dependencies in Oozie which
>>>>> are not explicitly used by us.
>>>>>
>>>>> The solution:
>>>>>
>>>>>    1. We would create a new module named hive-exec-dependencies which
>>>>>       would be a pom-packaging module without any Java source files.
>>>>>    2. All the dependencies declared in hive-exec would be moved to
>>>>>       hive-exec-dependencies.
>>>>>    3. We would make the hive-exec-dependencies module the parent of
>>>>>       hive-exec and with this hive-exec would still have access to the
>>>>>       same dependencies as before.
>>>>>    4. The maven shade plugin would still strip the dependencies from
>> the
>>>>>       generated hive-exec pom which are included in the fat Jar.
>>>>>    5. And with a small maven plugin we'd change hive-exec's parent back
>>>>>       from hive-exec-dependencies to the root hive project in the
>>>>>       generated hive-exec pom file.
>>>>>
>>>>> I have a change ready locally and it works as described above.
>>>>>
>>>>> With this on the Oozie side we could add a dependency on
>>>>> hive-exec-dependencies and hence all the required libraries which are
>>>>> included in the fat Jar would be pulled into Oozie.
>>>>> The next time a new dependency would be added to
>> hive-exec-dependencies,
>>>>> the Oozie build would pull it in automatically without us having to
>>>>> explicitly declare it.
>>>>>
>>>>> Please let me know what you think.
>>>>>
>>>>> Best,
>>>>> Dan
>>>>>
>>>>
>>>
>>
> 

Re: hive-exec vs. hive-exec:core

Posted by Edward Capriolo <ed...@gmail.com>.
recommendation from the Hive team is to use the hive-exec.jar artifact.

You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept bringing up
concurrency bugs in hive-server-1 my prs were rejected (even though hive
server2 would not have these bugs). I still cannot fathom why someone using
oozie would want a fat jar of hive (as opposed to hive server or hivejdbc)
. If I had to do that, i would just use shell action..... You all must like
enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun <su...@apache.org> wrote:

> I'm not sure whether it is a good idea to remove `hive-exec-core`
> completely - it is still being used today by some other popular projects
> including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives
> more flexibility to the other projects to shade & relocate those classes
> according to their need, without waiting for new Hive releases. Hive also
> needs to make sure it relocate everything properly. Otherwise, if some
> classes are shaded & included in `hive-exec` but not relocated, there is no
> way for the other projects to exclude them and avoid potential conflicts.
>
> Chao
>
> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>
> > Hey
> >
> > On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> > > Indeed this may lead to binary incompatibility problems as the one you
> > > mentioned. If I understood correctly the problem you cite comes up if
> > > library B in this case is not relocated. If Hive systematically
> relocates
> > > shaded deps do you think there will still be binary incompatibility
> > issues?
> > >
> > > If the relocating solution works, I would personally prefer going down
> > this
> > > path instead of introducing an entirely new module just for the sake of
> > > dependency management. Most of the time when there are problems with
> > > shading the answer comes from relocating the problematic dependencies
> and
> > > people are more or less accustomed with this route.
> >
> > I totally agree with you Stamatis - with the addition that we should work
> > together with the owners of other projects to help them use the correct
> > artifact to gain access to
> > Hive's internal parts.
> > I've opened HIVE-25531 to remove the core classified artifact - and
> ensure
> > that we will be uncovering and fixing future issues with the hive-exec
> > artifact.
> >
> > cheers,
> > Zoltan
> >
> >
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> > <fd...@cloudera.com.invalid>
> > > wrote:
> > >
> > >> Dear Hive developers,
> > >>
> > >> I am Dan from the Oozie team and I would like to bring up the
> > >> hive-exec.jar vs. hive-exec-core.jar topic.
> > >> The reason for that is because as far as we understand the official
> > >> recommendation from the Hive team is to use the hive-exec.jar
> artifact.
> > >>
> > >> However in Oozie that can end-up in a binary incompatibility.
> > >>
> > >> The reason for that is:
> > >>
> > >>    * Let's say library A is included in the fat Jar.
> > >>
> > >>    * And library B which is using library A is also included in the
> fat
> > Jar.
> > >>
> > >>    * Let's also say that library A's com.library.alib package is
> > >>      relocated to org.apache.hive.com.library.alib,
> > >>      meaning the com.library.alib.SomeClass becomes
> > >>      org.apache.hive.com.library.alib.SomeClass
> > >>
> > >>    * So if B has a method like public void
> > >>      someMethod(com.library.alib.SomeClass) then the signature of this
> > >>      method will be changed to:
> > >>      public void
> someMethod(org.apache.hive.com.library.alib.SomeClass)
> > >>
> > >>    * If Oozie is also using B directly meaning we'll have b.jar on our
> > >>      classpath, but with the unchanged signature,
> > >>      so when hive-exec tries to invoke someMethod then depending on
> > >>      whether b.jar coming from us will be loaded first or hive-exec
> > will,
> > >>      we can end-up with a NoSuchMethodError is hive-exec tries to pass
> > an
> > >>      org.apache.hive.com.library.alib.SomeClass instance to the
> > >>      someMethod which was loaded from the original b.jar.
> > >>
> > >> Hence in Oozie a long time ago (OOZIE-2621
> > >> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> > >> made to use the hive-exec-core Jar.
> > >>
> > >> Now since the shading process actually removes those dependencies from
> > >> the hive-exec pom which are included in the fat Jar, we manually had
> to
> > >> add some dependencies to Oozie to compensate this.
> > >> However these dependencies are not used by Oozie directly and with the
> > >> growing features of hive-exec we had to repeat the same process
> > >> over-and-over which is a bit unmaintainable.
> > >>
> > >> Today I'm writing to you to propose a long-term solution where
> basically
> > >> nothing would change in the generated hive artifacts, poms and the
> same
> > >> time we wouldn't have to manually declare dependencies in Oozie which
> > >> are not explicitly used by us.
> > >>
> > >> The solution:
> > >>
> > >>   1. We would create a new module named hive-exec-dependencies which
> > >>      would be a pom-packaging module without any Java source files.
> > >>   2. All the dependencies declared in hive-exec would be moved to
> > >>      hive-exec-dependencies.
> > >>   3. We would make the hive-exec-dependencies module the parent of
> > >>      hive-exec and with this hive-exec would still have access to the
> > >>      same dependencies as before.
> > >>   4. The maven shade plugin would still strip the dependencies from
> the
> > >>      generated hive-exec pom which are included in the fat Jar.
> > >>   5. And with a small maven plugin we'd change hive-exec's parent back
> > >>      from hive-exec-dependencies to the root hive project in the
> > >>      generated hive-exec pom file.
> > >>
> > >> I have a change ready locally and it works as described above.
> > >>
> > >> With this on the Oozie side we could add a dependency on
> > >> hive-exec-dependencies and hence all the required libraries which are
> > >> included in the fat Jar would be pulled into Oozie.
> > >> The next time a new dependency would be added to
> hive-exec-dependencies,
> > >> the Oozie build would pull it in automatically without us having to
> > >> explicitly declare it.
> > >>
> > >> Please let me know what you think.
> > >>
> > >> Best,
> > >> Dan
> > >>
> > >
> >
>

Re: hive-exec vs. hive-exec:core

Posted by Chao Sun <su...@apache.org>.
I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular projects
including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives
more flexibility to the other projects to shade & relocate those classes
according to their need, without waiting for new Hive releases. Hive also
needs to make sure it relocate everything properly. Otherwise, if some
classes are shaded & included in `hive-exec` but not relocated, there is no
way for the other projects to exclude them and avoid potential conflicts.

Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hey
>
> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> > Indeed this may lead to binary incompatibility problems as the one you
> > mentioned. If I understood correctly the problem you cite comes up if
> > library B in this case is not relocated. If Hive systematically relocates
> > shaded deps do you think there will still be binary incompatibility
> issues?
> >
> > If the relocating solution works, I would personally prefer going down
> this
> > path instead of introducing an entirely new module just for the sake of
> > dependency management. Most of the time when there are problems with
> > shading the answer comes from relocating the problematic dependencies and
> > people are more or less accustomed with this route.
>
> I totally agree with you Stamatis - with the addition that we should work
> together with the owners of other projects to help them use the correct
> artifact to gain access to
> Hive's internal parts.
> I've opened HIVE-25531 to remove the core classified artifact - and ensure
> that we will be uncovering and fixing future issues with the hive-exec
> artifact.
>
> cheers,
> Zoltan
>
>
> >
> > Best,
> > Stamatis
> >
> > On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> <fd...@cloudera.com.invalid>
> > wrote:
> >
> >> Dear Hive developers,
> >>
> >> I am Dan from the Oozie team and I would like to bring up the
> >> hive-exec.jar vs. hive-exec-core.jar topic.
> >> The reason for that is because as far as we understand the official
> >> recommendation from the Hive team is to use the hive-exec.jar artifact.
> >>
> >> However in Oozie that can end-up in a binary incompatibility.
> >>
> >> The reason for that is:
> >>
> >>    * Let's say library A is included in the fat Jar.
> >>
> >>    * And library B which is using library A is also included in the fat
> Jar.
> >>
> >>    * Let's also say that library A's com.library.alib package is
> >>      relocated to org.apache.hive.com.library.alib,
> >>      meaning the com.library.alib.SomeClass becomes
> >>      org.apache.hive.com.library.alib.SomeClass
> >>
> >>    * So if B has a method like public void
> >>      someMethod(com.library.alib.SomeClass) then the signature of this
> >>      method will be changed to:
> >>      public void someMethod(org.apache.hive.com.library.alib.SomeClass)
> >>
> >>    * If Oozie is also using B directly meaning we'll have b.jar on our
> >>      classpath, but with the unchanged signature,
> >>      so when hive-exec tries to invoke someMethod then depending on
> >>      whether b.jar coming from us will be loaded first or hive-exec
> will,
> >>      we can end-up with a NoSuchMethodError is hive-exec tries to pass
> an
> >>      org.apache.hive.com.library.alib.SomeClass instance to the
> >>      someMethod which was loaded from the original b.jar.
> >>
> >> Hence in Oozie a long time ago (OOZIE-2621
> >> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> >> made to use the hive-exec-core Jar.
> >>
> >> Now since the shading process actually removes those dependencies from
> >> the hive-exec pom which are included in the fat Jar, we manually had to
> >> add some dependencies to Oozie to compensate this.
> >> However these dependencies are not used by Oozie directly and with the
> >> growing features of hive-exec we had to repeat the same process
> >> over-and-over which is a bit unmaintainable.
> >>
> >> Today I'm writing to you to propose a long-term solution where basically
> >> nothing would change in the generated hive artifacts, poms and the same
> >> time we wouldn't have to manually declare dependencies in Oozie which
> >> are not explicitly used by us.
> >>
> >> The solution:
> >>
> >>   1. We would create a new module named hive-exec-dependencies which
> >>      would be a pom-packaging module without any Java source files.
> >>   2. All the dependencies declared in hive-exec would be moved to
> >>      hive-exec-dependencies.
> >>   3. We would make the hive-exec-dependencies module the parent of
> >>      hive-exec and with this hive-exec would still have access to the
> >>      same dependencies as before.
> >>   4. The maven shade plugin would still strip the dependencies from the
> >>      generated hive-exec pom which are included in the fat Jar.
> >>   5. And with a small maven plugin we'd change hive-exec's parent back
> >>      from hive-exec-dependencies to the root hive project in the
> >>      generated hive-exec pom file.
> >>
> >> I have a change ready locally and it works as described above.
> >>
> >> With this on the Oozie side we could add a dependency on
> >> hive-exec-dependencies and hence all the required libraries which are
> >> included in the fat Jar would be pulled into Oozie.
> >> The next time a new dependency would be added to hive-exec-dependencies,
> >> the Oozie build would pull it in automatically without us having to
> >> explicitly declare it.
> >>
> >> Please let me know what you think.
> >>
> >> Best,
> >> Dan
> >>
> >
>

Re: hive-exec vs. hive-exec:core

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> Indeed this may lead to binary incompatibility problems as the one you
> mentioned. If I understood correctly the problem you cite comes up if
> library B in this case is not relocated. If Hive systematically relocates
> shaded deps do you think there will still be binary incompatibility issues?
> 
> If the relocating solution works, I would personally prefer going down this
> path instead of introducing an entirely new module just for the sake of
> dependency management. Most of the time when there are problems with
> shading the answer comes from relocating the problematic dependencies and
> people are more or less accustomed with this route.

I totally agree with you Stamatis - with the addition that we should work together with the owners of other projects to help them use the correct artifact to gain access to 
Hive's internal parts.
I've opened HIVE-25531 to remove the core classified artifact - and ensure that we will be uncovering and fixing future issues with the hive-exec artifact.

cheers,
Zoltan


> 
> Best,
> Stamatis
> 
> On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi <fd...@cloudera.com.invalid>
> wrote:
> 
>> Dear Hive developers,
>>
>> I am Dan from the Oozie team and I would like to bring up the
>> hive-exec.jar vs. hive-exec-core.jar topic.
>> The reason for that is because as far as we understand the official
>> recommendation from the Hive team is to use the hive-exec.jar artifact.
>>
>> However in Oozie that can end-up in a binary incompatibility.
>>
>> The reason for that is:
>>
>>    * Let's say library A is included in the fat Jar.
>>
>>    * And library B which is using library A is also included in the fat Jar.
>>
>>    * Let's also say that library A's com.library.alib package is
>>      relocated to org.apache.hive.com.library.alib,
>>      meaning the com.library.alib.SomeClass becomes
>>      org.apache.hive.com.library.alib.SomeClass
>>
>>    * So if B has a method like public void
>>      someMethod(com.library.alib.SomeClass) then the signature of this
>>      method will be changed to:
>>      public void someMethod(org.apache.hive.com.library.alib.SomeClass)
>>
>>    * If Oozie is also using B directly meaning we'll have b.jar on our
>>      classpath, but with the unchanged signature,
>>      so when hive-exec tries to invoke someMethod then depending on
>>      whether b.jar coming from us will be loaded first or hive-exec will,
>>      we can end-up with a NoSuchMethodError is hive-exec tries to pass an
>>      org.apache.hive.com.library.alib.SomeClass instance to the
>>      someMethod which was loaded from the original b.jar.
>>
>> Hence in Oozie a long time ago (OOZIE-2621
>> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
>> made to use the hive-exec-core Jar.
>>
>> Now since the shading process actually removes those dependencies from
>> the hive-exec pom which are included in the fat Jar, we manually had to
>> add some dependencies to Oozie to compensate this.
>> However these dependencies are not used by Oozie directly and with the
>> growing features of hive-exec we had to repeat the same process
>> over-and-over which is a bit unmaintainable.
>>
>> Today I'm writing to you to propose a long-term solution where basically
>> nothing would change in the generated hive artifacts, poms and the same
>> time we wouldn't have to manually declare dependencies in Oozie which
>> are not explicitly used by us.
>>
>> The solution:
>>
>>   1. We would create a new module named hive-exec-dependencies which
>>      would be a pom-packaging module without any Java source files.
>>   2. All the dependencies declared in hive-exec would be moved to
>>      hive-exec-dependencies.
>>   3. We would make the hive-exec-dependencies module the parent of
>>      hive-exec and with this hive-exec would still have access to the
>>      same dependencies as before.
>>   4. The maven shade plugin would still strip the dependencies from the
>>      generated hive-exec pom which are included in the fat Jar.
>>   5. And with a small maven plugin we'd change hive-exec's parent back
>>      from hive-exec-dependencies to the root hive project in the
>>      generated hive-exec pom file.
>>
>> I have a change ready locally and it works as described above.
>>
>> With this on the Oozie side we could add a dependency on
>> hive-exec-dependencies and hence all the required libraries which are
>> included in the fat Jar would be pulled into Oozie.
>> The next time a new dependency would be added to hive-exec-dependencies,
>> the Oozie build would pull it in automatically without us having to
>> explicitly declare it.
>>
>> Please let me know what you think.
>>
>> Best,
>> Dan
>>
>