You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Konstantin Boudnik <co...@apache.org> on 2013/08/11 07:08:50 UTC

custom Hive artifacts for Shark project

Guys,

I am trying to help Spark/Shark community (spark-project.org and now
http://incubator.apache.org/projects/spark) with a predicament. Shark - that's
also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
query optimizer, serdes, and codecs. 

In order to improve some known issues with performance and/or concurrency
Shark developers need to apply a couple of patches on top of the stock Hive:
   https://issues.apache.org/jira/browse/HIVE-2891
   https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk)
(as per https://github.com/amplab/shark/wiki/Hive-Patches)

The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11
work is underway) and having developers to apply the patches and build
their own version of the Hive is an extra step that can be avoided. 

One way to address it is to publish Shark specific versions of Hive artifacts
that would have all needed patches applied to stock release.  This way
downstream projects can simply reference the version org.apache.hive with
version 0.9.0-shark-0.7 instead of building Hive locally every time.

Perhaps this approach is a little overkill, so perhaps if Hive community is
willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1
to include fixes needed by Shark project?

I am willing to step up and produce Hive release bits if any of the committers
here can help with publishing.

-- 
Thanks in advance,
	Cos


Re: custom Hive artifacts for Shark project

Posted by Konstantin Boudnik <co...@apache.org>.
Hi Edward,

Shark is using two jar files from Hive - hive-common and hive-cli. But Shark
community puts a few patches on top of the stock Hive to fix blocking issues
in the latter. The changes aren't proprietary and are either backports from
the newer releases or fixes that weren't committed yet (HIVE-3772 is good
example of this).

Taking into example Hive 0.9 which Shark 0.7 uses. Shark backports a few
bugfixes that were committed into Hive 0.10 or Hive 0.11, but never made it
into Hive 0.9. I believe this is a side effect of Hive always moving forward
and (almost) never making maintenance releases.

Changes and especially massive rewrites bring instability into the software.
It needs to be gradually ironed out with consequent releases. A good example
of such a project would be HBase, that does quite a number of minor releases
to provide their users with stable and robust server-side software. In the
absence of maintenance releases downstream projects tend to find ways to work
around such an obstacle. Hence my earlier email.

As of 0.11.1: Shark currently doesn't support Hive 0.11 because of significant
changes in the APIs of the latter. The support is coming in the next a couple
of months. So, publishing artifacts improving on top of Hive 0.9 might be more
a pressing issue.

Hope it clarifies the situation,
  Cos

On Sun, Aug 25, 2013 at 11:54PM, Edward Capriolo wrote:
> I think we plan on doing an 11.1 or just a 12.0. How does shark use hive?
> Do you just include hive components from maven or does the project somehow
> encorportate our build infrastructure.
> 
> 
> On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > Guys,
> >
> > considering the absence of the input, I take it that it really doesn't
> > matter
> > which way the custom artifact will be published. Is it a correct
> > impression?
> >
> > My first choice would be
> >     org.apache.hive.hive-common;0.9-shark0.7
> >     org.apache.hive.hive-cli;0.9-shark0.7
> > artifacts.
> > If this meets the objections from the community here, then I'd like to
> > proceed
> > with
> >     org.shark-project.hive-common;0.9.0
> >     org.shark-project.hive-cli;0.9.0
> >
> > Any of the artifacts are better be published at Maven central to make it
> > readily available for development community.
> >
> > Thoughts?
> > Regards,
> >   Cos
> >
> > On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
> > > Guys,
> > >
> > > I am trying to help Spark/Shark community (spark-project.org and now
> > > http://incubator.apache.org/projects/spark) with a predicament. Shark -
> > that's
> > > also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
> > > query optimizer, serdes, and codecs.
> > >
> > > In order to improve some known issues with performance and/or concurrency
> > > Shark developers need to apply a couple of patches on top of the stock
> > Hive:
> > >    https://issues.apache.org/jira/browse/HIVE-2891
> > >    https://issues.apache.org/jira/browse/HIVE-3772 (just committed to
> > trunk)
> > > (as per https://github.com/amplab/shark/wiki/Hive-Patches)
> > >
> > > The issue here is that latest Shark is working on top if Hive 0.9 (Hive
> > 0.11
> > > work is underway) and having developers to apply the patches and build
> > > their own version of the Hive is an extra step that can be avoided.
> > >
> > > One way to address it is to publish Shark specific versions of Hive
> > artifacts
> > > that would have all needed patches applied to stock release.  This way
> > > downstream projects can simply reference the version org.apache.hive with
> > > version 0.9.0-shark-0.7 instead of building Hive locally every time.
> > >
> > > Perhaps this approach is a little overkill, so perhaps if Hive community
> > is
> > > willing to consider a maintenance release of Hive 0.9.1 and perhaps
> > 0.11.1
> > > to include fixes needed by Shark project?
> > >
> > > I am willing to step up and produce Hive release bits if any of the
> > committers
> > > here can help with publishing.
> > >
> > > --
> > > Thanks in advance,
> > >       Cos
> > >
> >
> >
> >

Re: custom Hive artifacts for Shark project

Posted by Edward Capriolo <ed...@gmail.com>.
I think we plan on doing an 11.1 or just a 12.0. How does shark use hive?
Do you just include hive components from maven or does the project somehow
encorportate our build infrastructure.


On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Guys,
>
> considering the absence of the input, I take it that it really doesn't
> matter
> which way the custom artifact will be published. Is it a correct
> impression?
>
> My first choice would be
>     org.apache.hive.hive-common;0.9-shark0.7
>     org.apache.hive.hive-cli;0.9-shark0.7
> artifacts.
> If this meets the objections from the community here, then I'd like to
> proceed
> with
>     org.shark-project.hive-common;0.9.0
>     org.shark-project.hive-cli;0.9.0
>
> Any of the artifacts are better be published at Maven central to make it
> readily available for development community.
>
> Thoughts?
> Regards,
>   Cos
>
> On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
> > Guys,
> >
> > I am trying to help Spark/Shark community (spark-project.org and now
> > http://incubator.apache.org/projects/spark) with a predicament. Shark -
> that's
> > also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
> > query optimizer, serdes, and codecs.
> >
> > In order to improve some known issues with performance and/or concurrency
> > Shark developers need to apply a couple of patches on top of the stock
> Hive:
> >    https://issues.apache.org/jira/browse/HIVE-2891
> >    https://issues.apache.org/jira/browse/HIVE-3772 (just committed to
> trunk)
> > (as per https://github.com/amplab/shark/wiki/Hive-Patches)
> >
> > The issue here is that latest Shark is working on top if Hive 0.9 (Hive
> 0.11
> > work is underway) and having developers to apply the patches and build
> > their own version of the Hive is an extra step that can be avoided.
> >
> > One way to address it is to publish Shark specific versions of Hive
> artifacts
> > that would have all needed patches applied to stock release.  This way
> > downstream projects can simply reference the version org.apache.hive with
> > version 0.9.0-shark-0.7 instead of building Hive locally every time.
> >
> > Perhaps this approach is a little overkill, so perhaps if Hive community
> is
> > willing to consider a maintenance release of Hive 0.9.1 and perhaps
> 0.11.1
> > to include fixes needed by Shark project?
> >
> > I am willing to step up and produce Hive release bits if any of the
> committers
> > here can help with publishing.
> >
> > --
> > Thanks in advance,
> >       Cos
> >
>
>
>

Re: custom Hive artifacts for Shark project

Posted by Konstantin Boudnik <co...@apache.org>.
Guys,

considering the absence of the input, I take it that it really doesn't matter
which way the custom artifact will be published. Is it a correct impression?

My first choice would be
    org.apache.hive.hive-common;0.9-shark0.7
    org.apache.hive.hive-cli;0.9-shark0.7
artifacts.
If this meets the objections from the community here, then I'd like to proceed
with 
    org.shark-project.hive-common;0.9.0
    org.shark-project.hive-cli;0.9.0

Any of the artifacts are better be published at Maven central to make it
readily available for development community.

Thoughts?
Regards,
  Cos

On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
> Guys,
> 
> I am trying to help Spark/Shark community (spark-project.org and now
> http://incubator.apache.org/projects/spark) with a predicament. Shark - that's
> also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
> query optimizer, serdes, and codecs. 
> 
> In order to improve some known issues with performance and/or concurrency
> Shark developers need to apply a couple of patches on top of the stock Hive:
>    https://issues.apache.org/jira/browse/HIVE-2891
>    https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk)
> (as per https://github.com/amplab/shark/wiki/Hive-Patches)
> 
> The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11
> work is underway) and having developers to apply the patches and build
> their own version of the Hive is an extra step that can be avoided. 
> 
> One way to address it is to publish Shark specific versions of Hive artifacts
> that would have all needed patches applied to stock release.  This way
> downstream projects can simply reference the version org.apache.hive with
> version 0.9.0-shark-0.7 instead of building Hive locally every time.
> 
> Perhaps this approach is a little overkill, so perhaps if Hive community is
> willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1
> to include fixes needed by Shark project?
> 
> I am willing to step up and produce Hive release bits if any of the committers
> here can help with publishing.
> 
> -- 
> Thanks in advance,
> 	Cos
>