You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Brock Noland <br...@cloudera.com> on 2013/08/07 21:06:05 UTC

Re: [Discuss] project chop up

Thus far there hasn't been any dissent to managing our modules with maven.
 In addition there have been several comments positive on a move towards
maven. I'd like to add Ivy seems to have issues managing multiple versions
of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
testing patches that installed the new version of DataNucleus  I have had
the same issue on HIVE-4388. Requiring the deletion of the ivy cache
is extremely painful for developers that don't have access to high
bandwidth connections or live in areas far from California where most of
these jars are hosted.

I'd like to propose we move towards Maven.


On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam <mi...@yahoo.com> wrote:

>
>
> Yes hive build and test cases got convoluted as the project scope
> gradually increased. This is the time to take action!
>
> Based on my other Apache experiences, I prefer the option #3 "Breakup the
> projects within our own source tree". Make multiple modules or
> sub-projects. By default, only key modules will be built.
>
> Maven could be a possible candidate.
>
> Regards,
> Mohammad
>
>
>
> ________________________________
>  From: Edward Capriolo <ed...@gmail.com>
> To: "dev@hive.apache.org" <de...@hive.apache.org>
> Sent: Saturday, July 27, 2013 7:03 AM
> Subject: Re: [Discuss] project chop up
>
>
> Or feel free to suggest different approach. I am used to managing software
> as multi-module maven projects.
> From a development standpoint if I was working on beeline, it would be nice
> to only require some of the sub-projects to be open in my IDE to do that.
> Also managing everything globally is not ideal.
>
> Hive's project layout, build, and test infrastructure is just funky. It has
> to do a few interesting things (shims, testing), but I do not think what we
> are doing justifies the massive ant build system we have. Ant is so ten
> years ago.
>
>
>
> On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates <ga...@hortonworks.com>
> wrote:
>
> > But I assume they'd still be a part of targets like package, tar, and
> > binary?  Making them compile and test separately and explicitly load the
> > core Hive jars from maven/ivy seems reasonable.
> >
> > Alan.
> >
> > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> >
> > > Hi,
> > >
> > > I think thats part of it but I'd like to decouple the downstream
> projects
> > > even further so that the only connection is the dependency on the hive
> > jars.
> > >
> > > Brock
> > > On Jul 26, 2013 10:10 PM, "Alan Gates" <ga...@hortonworks.com> wrote:
> > >
> > >> I'm not sure how this is different from what hcat does today.  It
> needs
> > >> Hive's jars to compile, so it's one of the last things in the compile
> > step.
> > >> Would moving the other modules you note to be in the same category be
> > >> enough?  Did you want to also make it so that the default ant target
> > >> doesn't compile those?
> > >>
> > >> Alan.
> > >>
> > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > >>
> > >>> My mistake on saying hcat was a fork metastore. I had a brain fart
> for
> > a
> > >>> moment.
> > >>>
> > >>> One way we could do this is create a folder called downstream. In our
> > >>> release step we can execute the downstream builds and then copy the
> > files
> > >>> we need back. So nothing downstream will be on the classpath of the
> > main
> > >>> project.
> > >>>
> > >>> This could help us breakup ql as well. Things like exotic file
> formats
> > ,
> > >>> and things that are pluggable like zk locking can go here. That might
> > be
> > >>> overkill.
> > >>>
> > >>> For now we can focus on building downstream and hivethrift1might be
> the
> > >>> first thing to try to downstream.
> > >>>
> > >>>
> > >>> On Friday, July 26, 2013, Thejas Nair <th...@hortonworks.com>
> wrote:
> > >>>> +1 to the idea of making the build of core hive and other downstream
> > >>>> components independent.
> > >>>>
> > >>>> bq.  I was under the impression that Hcat and hive-metastore was
> > >>>> supposed to merge up somehow.
> > >>>>
> > >>>> The metastore code was never forked. Hcat was just using
> > >>>> hive-metastore and making the metadata available to rest of hadoop
> > >>>> (pig, java MR..).
> > >>>> A lot of the changes that were driven by hcat goals were being made
> in
> > >>>> hive-metastore. You can think of hcat as set of libraries that let
> pig
> > >>>> and java MR use hive metastore. Since hcat is closely tied to
> > >>>> hive-metastore, it makes sense to have them in same project.
> > >>>>
> > >>>>
> > >>>> On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo <
> > edlinuxguru@gmail.com
> > >>>
> > >>> wrote:
> > >>>>> Also i believe hcatalog web can fall into the same designation.
> > >>>>>
> > >>>>> Question , hcatalog was initily a big hive-metastore fork. I was
> > under
> > >>> the
> > >>>>> impression that Hcat and hive-metastore was supposed to merge up
> > >> somehow.
> > >>>>> What is the status on that? I remember that was one of the core
> > reasons
> > >>> we
> > >>>>> brought it in.
> > >>>>>
> > >>>>> On Friday, July 26, 2013, Edward Capriolo <ed...@gmail.com>
> > >> wrote:
> > >>>>>> I prefer option 3 as well.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland <
> brock@cloudera.com>
> > >>> wrote:
> > >>>>>>>
> > >>>>>>> On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo <
> > >> edlinuxguru@gmail.com
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I have been developing my laptop on a duel core 2 GB Ram laptop
> > for
> > >>>>> years
> > >>>>>>>> now. With the addition of hcatalog, hive-thrift2, and some other
> > >>> growth
> > >>>>>>>> trying to develop hive in a eclipse on this machine craws,
> > >> especially
> > >>>>> if
> > >>>>>>>> 'build automatically' is turned on. As we look to add on more
> > things
> > >>>>> this
> > >>>>>>>> is only going to get worse.
> > >>>>>>>>
> > >>>>>>>> I am also noticing issues like this:
> > >>>>>>>>
> > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-4849
> > >>>>>>>>
> > >>>>>>>> What I think we should do is strip down/out optional parts of
> > hive.
> > >>>>>>>>
> > >>>>>>>> 1) Hive Hbase
> > >>>>>>>> This should really be it's own project to do this right we
> really
> > >>>>> have to
> > >>>>>>>> have multiple branches since hbase is not backwards compatible.
> > >>>>>>>>
> > >>>>>>>> 2) Hive Web Interface
> > >>>>>>>> Now really a big project but not really critical can be just as
> > >>> easily
> > >>>>> be
> > >>>>>>>> build separately
> > >>>>>>>>
> > >>>>>>>> 3) hive thrift 1
> > >>>>>>>> We have hive thrift 2 now, it is time for the sun to set on
> > >>>>> hivethrift1,
> > >>>>>>>>
> > >>>>>>>> 4) odbc
> > >>>>>>>> Not entirely convinced about this one but it is really not
> > critical
> > >>> to
> > >>>>>>>> running hive.
> > >>>>>>>>
> > >>>>>>>> What I think we should do is create sub-projects for the above
> > >> things
> > >>>>> or
> > >>>>>>>> simply move them into directories that do not build with hive.
> > >>> Ideally
> > >>>>> they
> > >>>>>>>> would use maven to pull dependencies.
> > >>>>>>>>
> > >>>>>>>> What does everyone think?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> I agree that projects like the HBase handler and probably others
> as
> > >>> well
> > >>>>>>> should somehow be "downstream" projects which simply depend on
> the
> > >> hive
> > >>>>>>> jars.  I see a couple alternatives for this:
> > >>>>>>>
> > >>>>>>> * Take the "module" in question to the Apache Incubator
> > >>>>>>> * Move the "module" in question to the Apache Extras
> > >>>>>>> * Breakup the projects within our own source tree
> > >>>>>>>
> > >>>>>>> I'd prefer the third option at this point.
> > >>>>>>>
> > >>>>>>> Brock
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Brock
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>
> > >>
> >
> >
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: [Discuss] project chop up

Posted by Brock Noland <br...@cloudera.com>.

FYI I am still waiting on Infra for the CMS move:
https://issues.apache.org/jira/browse/INFRA-6593


On Wed, Aug 7, 2013 at 2:57 PM, Edward Capriolo <ed...@gmail.com>wrote:

> I think that is a good idea. I have been thinking about it a lot. I
> especially hate how the offline build is now broken.
>
> However I think it is going to take some time. There are some tricks like
> how we build hive-exec jar that are not very clean to do in maven. I am
> very interested
>
> The last initiative we spoke about on list was moving from forest, I would
> like to finish/start that before we get onto the project chop up.
>
>
> On Wed, Aug 7, 2013 at 3:06 PM, Brock Noland <br...@cloudera.com> wrote:
>
> > Thus far there hasn't been any dissent to managing our modules with
> maven.
> >  In addition there have been several comments positive on a move towards
> > maven. I'd like to add Ivy seems to have issues managing multiple
> versions
> > of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
> > testing patches that installed the new version of DataNucleus  I have had
> > the same issue on HIVE-4388. Requiring the deletion of the ivy cache
> > is extremely painful for developers that don't have access to high
> > bandwidth connections or live in areas far from California where most of
> > these jars are hosted.
> >
> > I'd like to propose we move towards Maven.
> >
> >
> > On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam <mi...@yahoo.com>
> > wrote:
> >
> > >
> > >
> > > Yes hive build and test cases got convoluted as the project scope
> > > gradually increased. This is the time to take action!
> > >
> > > Based on my other Apache experiences, I prefer the option #3 "Breakup
> the
> > > projects within our own source tree". Make multiple modules or
> > > sub-projects. By default, only key modules will be built.
> > >
> > > Maven could be a possible candidate.
> > >
> > > Regards,
> > > Mohammad
> > >
> > >
> > >
> > > ________________________________
> > >  From: Edward Capriolo <ed...@gmail.com>
> > > To: "dev@hive.apache.org" <de...@hive.apache.org>
> > > Sent: Saturday, July 27, 2013 7:03 AM
> > > Subject: Re: [Discuss] project chop up
> > >
> > >
> > > Or feel free to suggest different approach. I am used to managing
> > software
> > > as multi-module maven projects.
> > > From a development standpoint if I was working on beeline, it would be
> > nice
> > > to only require some of the sub-projects to be open in my IDE to do
> that.
> > > Also managing everything globally is not ideal.
> > >
> > > Hive's project layout, build, and test infrastructure is just funky. It
> > has
> > > to do a few interesting things (shims, testing), but I do not think
> what
> > we
> > > are doing justifies the massive ant build system we have. Ant is so ten
> > > years ago.
> > >
> > >
> > >
> > > On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates <ga...@hortonworks.com>
> > > wrote:
> > >
> > > > But I assume they'd still be a part of targets like package, tar, and
> > > > binary?  Making them compile and test separately and explicitly load
> > the
> > > > core Hive jars from maven/ivy seems reasonable.
> > > >
> > > > Alan.
> > > >
> > > > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I think thats part of it but I'd like to decouple the downstream
> > > projects
> > > > > even further so that the only connection is the dependency on the
> > hive
> > > > jars.
> > > > >
> > > > > Brock
> > > > > On Jul 26, 2013 10:10 PM, "Alan Gates" <ga...@hortonworks.com>
> > wrote:
> > > > >
> > > > >> I'm not sure how this is different from what hcat does today.  It
> > > needs
> > > > >> Hive's jars to compile, so it's one of the last things in the
> > compile
> > > > step.
> > > > >> Would moving the other modules you note to be in the same category
> > be
> > > > >> enough?  Did you want to also make it so that the default ant
> target
> > > > >> doesn't compile those?
> > > > >>
> > > > >> Alan.
> > > > >>
> > > > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > > > >>
> > > > >>> My mistake on saying hcat was a fork metastore. I had a brain
> fart
> > > for
> > > > a
> > > > >>> moment.
> > > > >>>
> > > > >>> One way we could do this is create a folder called downstream. In
> > our
> > > > >>> release step we can execute the downstream builds and then copy
> the
> > > > files
> > > > >>> we need back. So nothing downstream will be on the classpath of
> the
> > > > main
> > > > >>> project.
> > > > >>>
> > > > >>> This could help us breakup ql as well. Things like exotic file
> > > formats
> > > > ,
> > > > >>> and things that are pluggable like zk locking can go here. That
> > might
> > > > be
> > > > >>> overkill.
> > > > >>>
> > > > >>> For now we can focus on building downstream and hivethrift1might
> be
> > > the
> > > > >>> first thing to try to downstream.
> > > > >>>
> > > > >>>
> > > > >>> On Friday, July 26, 2013, Thejas Nair <th...@hortonworks.com>
> > > wrote:
> > > > >>>> +1 to the idea of making the build of core hive and other
> > downstream
> > > > >>>> components independent.
> > > > >>>>
> > > > >>>> bq.  I was under the impression that Hcat and hive-metastore was
> > > > >>>> supposed to merge up somehow.
> > > > >>>>
> > > > >>>> The metastore code was never forked. Hcat was just using
> > > > >>>> hive-metastore and making the metadata available to rest of
> hadoop
> > > > >>>> (pig, java MR..).
> > > > >>>> A lot of the changes that were driven by hcat goals were being
> > made
> > > in
> > > > >>>> hive-metastore. You can think of hcat as set of libraries that
> let
> > > pig
> > > > >>>> and java MR use hive metastore. Since hcat is closely tied to
> > > > >>>> hive-metastore, it makes sense to have them in same project.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo <
> > > > edlinuxguru@gmail.com
> > > > >>>
> > > > >>> wrote:
> > > > >>>>> Also i believe hcatalog web can fall into the same designation.
> > > > >>>>>
> > > > >>>>> Question , hcatalog was initily a big hive-metastore fork. I
> was
> > > > under
> > > > >>> the
> > > > >>>>> impression that Hcat and hive-metastore was supposed to merge
> up
> > > > >> somehow.
> > > > >>>>> What is the status on that? I remember that was one of the core
> > > > reasons
> > > > >>> we
> > > > >>>>> brought it in.
> > > > >>>>>
> > > > >>>>> On Friday, July 26, 2013, Edward Capriolo <
> edlinuxguru@gmail.com
> > >
> > > > >> wrote:
> > > > >>>>>> I prefer option 3 as well.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland <
> > > brock@cloudera.com>
> > > > >>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo <
> > > > >> edlinuxguru@gmail.com
> > > > >>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> I have been developing my laptop on a duel core 2 GB Ram
> > laptop
> > > > for
> > > > >>>>> years
> > > > >>>>>>>> now. With the addition of hcatalog, hive-thrift2, and some
> > other
> > > > >>> growth
> > > > >>>>>>>> trying to develop hive in a eclipse on this machine craws,
> > > > >> especially
> > > > >>>>> if
> > > > >>>>>>>> 'build automatically' is turned on. As we look to add on
> more
> > > > things
> > > > >>>>> this
> > > > >>>>>>>> is only going to get worse.
> > > > >>>>>>>>
> > > > >>>>>>>> I am also noticing issues like this:
> > > > >>>>>>>>
> > > > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-4849
> > > > >>>>>>>>
> > > > >>>>>>>> What I think we should do is strip down/out optional parts
> of
> > > > hive.
> > > > >>>>>>>>
> > > > >>>>>>>> 1) Hive Hbase
> > > > >>>>>>>> This should really be it's own project to do this right we
> > > really
> > > > >>>>> have to
> > > > >>>>>>>> have multiple branches since hbase is not backwards
> > compatible.
> > > > >>>>>>>>
> > > > >>>>>>>> 2) Hive Web Interface
> > > > >>>>>>>> Now really a big project but not really critical can be just
> > as
> > > > >>> easily
> > > > >>>>> be
> > > > >>>>>>>> build separately
> > > > >>>>>>>>
> > > > >>>>>>>> 3) hive thrift 1
> > > > >>>>>>>> We have hive thrift 2 now, it is time for the sun to set on
> > > > >>>>> hivethrift1,
> > > > >>>>>>>>
> > > > >>>>>>>> 4) odbc
> > > > >>>>>>>> Not entirely convinced about this one but it is really not
> > > > critical
> > > > >>> to
> > > > >>>>>>>> running hive.
> > > > >>>>>>>>
> > > > >>>>>>>> What I think we should do is create sub-projects for the
> above
> > > > >> things
> > > > >>>>> or
> > > > >>>>>>>> simply move them into directories that do not build with
> hive.
> > > > >>> Ideally
> > > > >>>>> they
> > > > >>>>>>>> would use maven to pull dependencies.
> > > > >>>>>>>>
> > > > >>>>>>>> What does everyone think?
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> I agree that projects like the HBase handler and probably
> > others
> > > as
> > > > >>> well
> > > > >>>>>>> should somehow be "downstream" projects which simply depend
> on
> > > the
> > > > >> hive
> > > > >>>>>>> jars.  I see a couple alternatives for this:
> > > > >>>>>>>
> > > > >>>>>>> * Take the "module" in question to the Apache Incubator
> > > > >>>>>>> * Move the "module" in question to the Apache Extras
> > > > >>>>>>> * Breakup the projects within our own source tree
> > > > >>>>>>>
> > > > >>>>>>> I'd prefer the third option at this point.
> > > > >>>>>>>
> > > > >>>>>>> Brock
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Brock
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> >
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: [Discuss] project chop up

Posted by Edward Capriolo <ed...@gmail.com>.

I think that is a good idea. I have been thinking about it a lot. I
especially hate how the offline build is now broken.

However I think it is going to take some time. There are some tricks like
how we build hive-exec jar that are not very clean to do in maven. I am
very interested

The last initiative we spoke about on list was moving from forest, I would
like to finish/start that before we get onto the project chop up.


On Wed, Aug 7, 2013 at 3:06 PM, Brock Noland <br...@cloudera.com> wrote:

> Thus far there hasn't been any dissent to managing our modules with maven.
>  In addition there have been several comments positive on a move towards
> maven. I'd like to add Ivy seems to have issues managing multiple versions
> of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
> testing patches that installed the new version of DataNucleus  I have had
> the same issue on HIVE-4388. Requiring the deletion of the ivy cache
> is extremely painful for developers that don't have access to high
> bandwidth connections or live in areas far from California where most of
> these jars are hosted.
>
> I'd like to propose we move towards Maven.
>
>
> On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
> >
> >
> > Yes hive build and test cases got convoluted as the project scope
> > gradually increased. This is the time to take action!
> >
> > Based on my other Apache experiences, I prefer the option #3 "Breakup the
> > projects within our own source tree". Make multiple modules or
> > sub-projects. By default, only key modules will be built.
> >
> > Maven could be a possible candidate.
> >
> > Regards,
> > Mohammad
> >
> >
> >
> > ________________________________
> >  From: Edward Capriolo <ed...@gmail.com>
> > To: "dev@hive.apache.org" <de...@hive.apache.org>
> > Sent: Saturday, July 27, 2013 7:03 AM
> > Subject: Re: [Discuss] project chop up
> >
> >
> > Or feel free to suggest different approach. I am used to managing
> software
> > as multi-module maven projects.
> > From a development standpoint if I was working on beeline, it would be
> nice
> > to only require some of the sub-projects to be open in my IDE to do that.
> > Also managing everything globally is not ideal.
> >
> > Hive's project layout, build, and test infrastructure is just funky. It
> has
> > to do a few interesting things (shims, testing), but I do not think what
> we
> > are doing justifies the massive ant build system we have. Ant is so ten
> > years ago.
> >
> >
> >
> > On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates <ga...@hortonworks.com>
> > wrote:
> >
> > > But I assume they'd still be a part of targets like package, tar, and
> > > binary?  Making them compile and test separately and explicitly load
> the
> > > core Hive jars from maven/ivy seems reasonable.
> > >
> > > Alan.
> > >
> > > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> > >
> > > > Hi,
> > > >
> > > > I think thats part of it but I'd like to decouple the downstream
> > projects
> > > > even further so that the only connection is the dependency on the
> hive
> > > jars.
> > > >
> > > > Brock
> > > > On Jul 26, 2013 10:10 PM, "Alan Gates" <ga...@hortonworks.com>
> wrote:
> > > >
> > > >> I'm not sure how this is different from what hcat does today.  It
> > needs
> > > >> Hive's jars to compile, so it's one of the last things in the
> compile
> > > step.
> > > >> Would moving the other modules you note to be in the same category
> be
> > > >> enough?  Did you want to also make it so that the default ant target
> > > >> doesn't compile those?
> > > >>
> > > >> Alan.
> > > >>
> > > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > > >>
> > > >>> My mistake on saying hcat was a fork metastore. I had a brain fart
> > for
> > > a
> > > >>> moment.
> > > >>>
> > > >>> One way we could do this is create a folder called downstream. In
> our
> > > >>> release step we can execute the downstream builds and then copy the
> > > files
> > > >>> we need back. So nothing downstream will be on the classpath of the
> > > main
> > > >>> project.
> > > >>>
> > > >>> This could help us breakup ql as well. Things like exotic file
> > formats
> > > ,
> > > >>> and things that are pluggable like zk locking can go here. That
> might
> > > be
> > > >>> overkill.
> > > >>>
> > > >>> For now we can focus on building downstream and hivethrift1might be
> > the
> > > >>> first thing to try to downstream.
> > > >>>
> > > >>>
> > > >>> On Friday, July 26, 2013, Thejas Nair <th...@hortonworks.com>
> > wrote:
> > > >>>> +1 to the idea of making the build of core hive and other
> downstream
> > > >>>> components independent.
> > > >>>>
> > > >>>> bq.  I was under the impression that Hcat and hive-metastore was
> > > >>>> supposed to merge up somehow.
> > > >>>>
> > > >>>> The metastore code was never forked. Hcat was just using
> > > >>>> hive-metastore and making the metadata available to rest of hadoop
> > > >>>> (pig, java MR..).
> > > >>>> A lot of the changes that were driven by hcat goals were being
> made
> > in
> > > >>>> hive-metastore. You can think of hcat as set of libraries that let
> > pig
> > > >>>> and java MR use hive metastore. Since hcat is closely tied to
> > > >>>> hive-metastore, it makes sense to have them in same project.
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo <
> > > edlinuxguru@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>>> Also i believe hcatalog web can fall into the same designation.
> > > >>>>>
> > > >>>>> Question , hcatalog was initily a big hive-metastore fork. I was
> > > under
> > > >>> the
> > > >>>>> impression that Hcat and hive-metastore was supposed to merge up
> > > >> somehow.
> > > >>>>> What is the status on that? I remember that was one of the core
> > > reasons
> > > >>> we
> > > >>>>> brought it in.
> > > >>>>>
> > > >>>>> On Friday, July 26, 2013, Edward Capriolo <edlinuxguru@gmail.com
> >
> > > >> wrote:
> > > >>>>>> I prefer option 3 as well.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland <
> > brock@cloudera.com>
> > > >>> wrote:
> > > >>>>>>>
> > > >>>>>>> On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo <
> > > >> edlinuxguru@gmail.com
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> I have been developing my laptop on a duel core 2 GB Ram
> laptop
> > > for
> > > >>>>> years
> > > >>>>>>>> now. With the addition of hcatalog, hive-thrift2, and some
> other
> > > >>> growth
> > > >>>>>>>> trying to develop hive in a eclipse on this machine craws,
> > > >> especially
> > > >>>>> if
> > > >>>>>>>> 'build automatically' is turned on. As we look to add on more
> > > things
> > > >>>>> this
> > > >>>>>>>> is only going to get worse.
> > > >>>>>>>>
> > > >>>>>>>> I am also noticing issues like this:
> > > >>>>>>>>
> > > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-4849
> > > >>>>>>>>
> > > >>>>>>>> What I think we should do is strip down/out optional parts of
> > > hive.
> > > >>>>>>>>
> > > >>>>>>>> 1) Hive Hbase
> > > >>>>>>>> This should really be it's own project to do this right we
> > really
> > > >>>>> have to
> > > >>>>>>>> have multiple branches since hbase is not backwards
> compatible.
> > > >>>>>>>>
> > > >>>>>>>> 2) Hive Web Interface
> > > >>>>>>>> Now really a big project but not really critical can be just
> as
> > > >>> easily
> > > >>>>> be
> > > >>>>>>>> build separately
> > > >>>>>>>>
> > > >>>>>>>> 3) hive thrift 1
> > > >>>>>>>> We have hive thrift 2 now, it is time for the sun to set on
> > > >>>>> hivethrift1,
> > > >>>>>>>>
> > > >>>>>>>> 4) odbc
> > > >>>>>>>> Not entirely convinced about this one but it is really not
> > > critical
> > > >>> to
> > > >>>>>>>> running hive.
> > > >>>>>>>>
> > > >>>>>>>> What I think we should do is create sub-projects for the above
> > > >> things
> > > >>>>> or
> > > >>>>>>>> simply move them into directories that do not build with hive.
> > > >>> Ideally
> > > >>>>> they
> > > >>>>>>>> would use maven to pull dependencies.
> > > >>>>>>>>
> > > >>>>>>>> What does everyone think?
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> I agree that projects like the HBase handler and probably
> others
> > as
> > > >>> well
> > > >>>>>>> should somehow be "downstream" projects which simply depend on
> > the
> > > >> hive
> > > >>>>>>> jars.  I see a couple alternatives for this:
> > > >>>>>>>
> > > >>>>>>> * Take the "module" in question to the Apache Incubator
> > > >>>>>>> * Move the "module" in question to the Apache Extras
> > > >>>>>>> * Breakup the projects within our own source tree
> > > >>>>>>>
> > > >>>>>>> I'd prefer the third option at this point.
> > > >>>>>>>
> > > >>>>>>> Brock
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Brock
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>

Hive trunk Jenkins build failed

Posted by Mohammad Islam <mi...@yahoo.com>.

Hi,
I found the last Jenkins build failed for hadoop2.
The console output link and error message are pasted below.

I also experienced the same issue yesterday. Running "ant very-clean...." fixed my issue (Thanks to Ashutosh!).

If someone can run the same on build machine, the problem might go away. There could be other ways to solve the same.

Regards,
Mohammad

https://builds.apache.org/job/Hive-trunk-hadoop2/339/


compile-test: [echo] Project: jdbc [javac] Compiling 2 source files to /home/jenkins/jenkins-slave/workspace/Hive-trunk-hadoop2/hive/build/jdbc/test/classes [javac] /home/jenkins/jenkins-slave/workspace/Hive-trunk-hadoop2/hive/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java:677: cannot find symbol [javac] symbol  : variable HIVE_SERVER2_TABLE_TYPE_MAPPING [javac] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars [javac]     stmt.execute("set " + HiveConf.ConfVars.HIVE_SERVER2_TABLE_TYPE_MAPPING.varname + [javac]                                            ^ [javac] /home/jenkins/jenkins-slave/workspace/Hive-trunk-hadoop2/hive/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java:684: cannot find symbol [javac] symbol  : variable HIVE_SERVER2_TABLE_TYPE_MAPPING [javac] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars [javac]     stmt.execute("set " + HiveConf.ConfVars.HIVE_SERVER2_TABLE_TYPE_MAPPING.varname +

Re: [Discuss] project chop up

Posted by Owen O'Malley <om...@apache.org>.

On Wed, Aug 7, 2013 at 2:04 PM, Edward Capriolo <ed...@gmail.com>wrote:

> "Some of the hard part was that some of the test classes are in the wrong
> module that references classes in a later module."
>
> I think the modules will have to be able to reference each other in many
> cases. Serde and QL are tightly coupled. QL is really too large and we
> should find a way to cut that up.
>

Of course the modules need to reference each other. The problematic test
classes depend on modules lower in the tree, so they form a cycle in
dependency DAG. It only works in the ant build because it compiles all of
the modules before it does the test-compile in any of the modules.

-- Owen


>
> Part of this problem is the q.tests
>
> I think one way to handle this is to only allow unit tests inside the
> module. I imagine running all the q tests would be done in a final module
> hive-qtest. Or possibly two final modules
> hive-qtest
> hive-qtest-extra (tangential things like UDFS and input formats not core to
> hive)
>
>
> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <om...@apache.org> wrote:
>
> > On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
> > kulkarni.swarnim@gmail.com> wrote:
> >
> > > > I'd like to propose we move towards Maven.
> > >
> > > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
> > etc.)
> > > are maven based.
> > >
> >
> > A big +1 from me too. I actually took a pass at it a couple of months
> ago.
> > Some of the hard part was that some of the test classes are in the wrong
> > module that references classes in a later module. Obviously that prevents
> > any kind of modular build.
> >
> > As an additional plus to Maven is that Maven includes tools to correct
> the
> > project and module dependencies.
> >
> > -- Owen
> >
>

Re: [Discuss] project chop up

Posted by Thiruvel Thirumoolan <th...@yahoo-inc.com>.

+1 Thanks Edward.

On 8/20/13 11:35 PM, "amareshwari sriramdasu" <am...@gmail.com>
wrote:

>Sounds great! Looking forward !
>
>
>On Tue, Aug 20, 2013 at 7:58 PM, Edward Capriolo
><ed...@gmail.com>wrote:
>
>> Just an update. This is going very well:
>>
>> NFO] Nothing to compile - all classes are up to date
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Reactor Summary:
>> [INFO]
>> [INFO] Apache Hive ....................................... SUCCESS
>>[0.002s]
>> [INFO] hive-shims-x ...................................... SUCCESS
>>[1.210s]
>> [INFO] hive-shims-20 ..................................... SUCCESS
>>[0.125s]
>> [INFO] hive-common ....................................... SUCCESS
>>[0.082s]
>> [INFO] hive-serde ........................................ SUCCESS
>>[2.521s]
>> [INFO] hive-metastore .................................... SUCCESS
>> [10.818s]
>> [INFO] hive-exec ......................................... SUCCESS
>>[4.521s]
>> [INFO] hive-avro ......................................... SUCCESS
>>[1.582s]
>> [INFO] hive-zookeeper .................................... SUCCESS
>>[0.519s]
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD SUCCESS
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Total time: 21.613s
>> [INFO] Finished at: Tue Aug 20 10:23:34 EDT 2013
>> [INFO] Final Memory: 39M/408M
>>
>>
>> Though I did some short cuts and disabled some tests. We can build hive
>> very fast, including incremental builds. Also we are using maven
>>plugins to
>> compile antlr, thrift, protobuf, datanucleas and building those every
>>time.
>>
>>
>> On Fri, Aug 16, 2013 at 11:16 PM, Xuefu Zhang <xz...@cloudera.com>
>>wrote:
>>
>> > Thanks, Edward.
>> >
>> > I'm big +1 to mavenize Hive. Hive has long reached a point where it's
>> hard
>> > to manage its build using ant. I'd like to help on this too.
>> >
>> > Thanks,
>> > Xuefu
>> >
>> >
>> > On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo
>><edlinuxguru@gmail.com
>> > >wrote:
>> >
>> > > For those interested in pitching in.
>> > > https://github.com/edwardcapriolo/hive
>> > >
>> > >
>> > >
>> > > On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <
>> edlinuxguru@gmail.com
>> > > >wrote:
>> > >
>> > > > Summary from hive-irc channel. Minor edits for spell
>>check/grammar.
>> > > >
>> > > > The last 10 lines are a summary of the key points.
>> > > >
>> > > > [10:59:17] <ecapriolo> noland: et all. Do you want to talk about
>>hive
>> > in
>> > > > maven?
>> > > > [11:01:06] smonchi [~
>> > > > ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit
>> > IRC:
>> > > > Quit: ... 'cause there is no patch for human stupidity ...
>> > > > [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
>> > > > [11:10:22] <noland> I saw you created the jira but haven't had
>>time
>> to
>> > > look
>> > > > [11:10:32] <ecapriolo> So I found a few things
>> > > > [11:10:49] <ecapriolo> In common there is one or two testats that
>> > > actually
>> > > > fork a process :)
>> > > > [11:10:56] <ecapriolo> and use build.test.resources
>> > > > [11:11:12] <ecapriolo> Some serde, uses some methods from ql in
>> testing
>> > > > [11:11:27] <ecapriolo> and shims really needs a separate hadoop
>>test
>> > shim
>> > > > [11:11:32] <ecapriolo> But that is all simple stuff
>> > > > [11:11:47] <ecapriolo> The biggest problem is I do not know how to
>> > solve
>> > > > shims with maven
>> > > > [11:11:50] <ecapriolo> do you have any ideas
>> > > > [11:11:52] <ecapriolo> ?
>> > > > [11:13:00] <noland> That one is going to be a challenge. It might
>>be
>> > that
>> > > > in that section we have to drop down to ant
>> > > > [11:14:44] <noland> Is it a requirement that we build both the .20
>> and
>> > > .23
>> > > > shims for a "package" as we do today?
>> > > > [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC
>>driver
>> > > > [11:16:59] <ecapriolo> Se separate out the interface of shims
>> > > > [11:17:22] <ecapriolo> And then at runtime we drop in a driver
>> > > implementing
>> > > > [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC:
>>Remote
>> > host
>> > > > closed the connection
>> > > > [11:17:36] <ecapriolo> That or we could use maven's profile system
>> > > > [11:18:09] <ecapriolo> It seems that everything else can actually
>> link
>> > > > against hadoop-0.20.2 as a provided dependency
>> > > > [11:18:37] <noland> Yeah either would work. The driver method
>>would
>> > > > probably require use to use ant build both the drivers?
>> > > > [11:18:44] <noland> I am a fan of mvn profiles
>> > > > [11:19:05] <ecapriolo> I was thinking we kinda separate the shim
>>out
>> > into
>> > > > its own project,, not a module
>> > > > [11:19:10] <ecapriolo> to achive that jdbc thing
>> > > > [11:19:27] <ecapriolo> But I do not have a solution yet, I was
>> looking
>> > to
>> > > > farm that out to someone smart...like you :)
>> > > > [11:19:33] <noland> :)
>> > > > [11:19:47] <ecapriolo> All I know is that we need a test shim
>>because
>> > > > HadoopShim requires hadoop-test jars
>> > > > [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest
>> anyway
>> > > > [11:20:48] <ecapriolo> Is this something you want to help with? I
>>was
>> > > > thinking of spinning up a github
>> > > > [11:20:50] <noland> I think that the separate projects would work
>>and
>> > > > perhaps nicely.
>> > > > [11:21:01] <noland> Yeah I'd be interested in helping!
>> > > > [11:21:17] <noland> But I am going on vacation starting next week
>>for
>> > > > about 10 days
>> > > > [11:21:27] <ecapriolo> Ah cool where are you going?
>> > > > [11:21:37] <noland> Netherlands
>> > > > [11:21:42] <noland> Biking around and such
>> > > > [11:23:52] <noland> The one thing I was thinking about with
>>regards
>> to
>> > a
>> > > > branch is keeping history. We'll want to keep history for the
>>files
>> but
>> > > > AFAICT svn doesn't understand git mv.
>> > > > [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
>> > > > [11:31:19] jeromatron
>>[~textual@host90-152-1-162.ipv4.regusnet.com]
>> > has
>> > > > quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzzŠ
>> > > > [11:35:49] <ecapriolo> noland: Right I do not play to suggest
>>that we
>> > > will
>> > > > do this in git
>> > > > [11:36:11] <ecapriolo> I just see that we are going to have to
>>hack
>> > stuff
>> > > > up and it is not the type of work that lends itself well to
>>branches.
>> > > > [11:36:17] <noland> Ahh ok
>> > > > [11:36:56] <ecapriolo> Once we come up with a solution for the
>>shims,
>> > and
>> > > > we have something that can reasonably build and test hive we can
>> figure
>> > > out
>> > > > how to apply that to a branch/trunk
>> > > > [11:36:58] <noland> yeah so just do a POC on github and then
>> implement
>> > on
>> > > > svn
>> > > > [11:37:05] <noland> cool
>> > > > [11:37:29] <ecapriolo> Along the way we can probably find things
>>that
>> > we
>> > > > can do like that common test I found and other minor things
>> > > > [11:37:41] <noland> sounds good
>> > > > [11:37:50] <ecapriolo> Those we can likely just commit into the
>> current
>> > > > trunk and I will file issues for those now
>> > > > [11:37:58] <noland> cool
>> > > > [11:38:41] <ecapriolo> But yea man. I just cant take the project
>>as
>> it
>> > is
>> > > > now
>> > > > [11:38:51] <ecapriolo> in eclipse everytime I touch a file it
>> rebuilds
>> > > > everything!
>> > > > [11:38:53] <ecapriolo> Its like WTF
>> > > > [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
>> > > > [11:39:12] <ecapriolo> its out of control
>> > > > [11:39:23] <noland> LOL
>> > > > [11:39:29] <noland> I agree 110%
>> > > > [11:39:32] <ecapriolo> eclipse was not always like that I am not
>>sure
>> > how
>> > > > the hell it happened
>> > > > [11:39:51] <noland> The eclipse sep thing is so harmful
>> > > > [11:40:08] <noland> dep thing that is
>> > > > [11:40:12] <ecapriolo> I mean command line ant was always bad, but
>> you
>> > > > used to be able to work in eclipse without having to rebuild
>> everything
>> > > > every change/test
>> > > > [11:40:39] <noland> Yeah the first thing I do these days is
>>disable
>> the
>> > > > ant builder
>> > > > [11:40:52] <ecapriolo> Ow... I did not really know that was a
>>thing
>> > > > [11:40:55] <noland> it starts compiling while you are still
>>working
>> and
>> > > > blocks for minutes
>> > > > [11:41:02] <ecapriolo> Right that is what I mean
>> > > > [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the
>> > project
>> > > > [11:41:14] <noland> yeah you can remove it in projectŠone sec
>> > > > [11:41:17] <ecapriolo> perm gen
>> > > > [11:41:20] <ecapriolo> ant builder
>> > > > [11:41:32] <noland> project -> properties -> builders
>> > > > [11:41:34] <ecapriolo> hive does not build offline anymore
>> > > > [11:41:37] <noland> yeah
>> > > > [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but
>>it
>> has
>> > > > gotten really really bad
>> > > > [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
>> > > > non-essentials
>> > > > [11:42:25] <ecapriolo> like serde has all this thrift and avro
>>stuff
>> to
>> > > > support custom formats
>> > > > [11:42:30] <ecapriolo> that is going into its own module
>> > > > [11:42:43] <ecapriolo> Going to rip out all the udfs accept
>>between
>> and
>> > > or.
>> > > > [11:43:50] <noland> yeah it'd be nice to have those items in their
>> own
>> > > > modules so you can just build/test them when you want
>> > > > [11:44:12] <ecapriolo> hbase zookeeper locking
>> > > > [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC:
>>Remote
>> > host
>> > > > closed the connection
>> > > > [11:44:44] <noland> yeah for sure
>> > > > [11:45:04] <noland> I think the default for testing should be the
>>in
>> > > > process locking
>> > > > [11:45:10] <ecapriolo> Absolutely.
>> > > > [11:45:40] <ecapriolo> The other issue I want to tackle is
>> > hive-exec.jar
>> > > > [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
>> > > > [11:46:46] <ecapriolo> I run into to many conflicts with log4j and
>> > guava,
>> > > > and commons-utils all those things need to be packaged into
>> > > non-conflicting
>> > > > packages
>> > > > [11:46:58] <noland> I haven't looked at how we build that yet but
>>I
>> > agree
>> > > > it'd be nice if we could jar-jar things like guava
>> > > > [11:47:12] <noland> so we can actually use them on server side
>> > > > [11:47:16] <ecapriolo> We dont really need quava. its probably
>>just
>> > used
>> > > > for one tiny thing
>> > > > [11:47:43] <ecapriolo> People are forgetting/do not understand
>>that
>> > > > hive-exec needs to get sent via the distributed cache
>> > > > [11:47:57] <noland> Wen we implement range joins they have a
>>RangeMap
>> > > that
>> > > > we'll need.
>> > > > [11:47:57] <ecapriolo> so making it hulkingly fat just slows
>> everything
>> > > > down
>> > > > [11:48:11] <noland> Do we ship it every time?
>> > > > [11:48:25] <noland> Cause we only have to ship it once per
>>version of
>> > the
>> > > > jar.
>> > > > [11:48:42] <ecapriolo> Recently you need the jackson jars on the
>> auxlib
>> > > as
>> > > > well
>> > > > [11:48:46] <ecapriolo> hive will not work without it
>> > > > [11:49:11] <ecapriolo> People are just focused
>> > > > feature-feature-feature...bigger...bigger bigger
>> > > > [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq]
>>has
>> > quit
>> > > > IRC: Quit: Leaving
>> > > > [11:49:27] <noland> yeah maven modules will definitely help us
>> > understand
>> > > > who depends on what.
>> > > > [11:49:28] <ecapriolo> Next up kyro
>> > > > [11:49:51] <noland> I agree there is a lot of tech debt that needs
>> > paying
>> > > > [11:50:30] <ecapriolo> So those are all the high level things I
>>want
>> to
>> > > > tackle
>> > > > [11:50:59] <ecapriolo> shims, general cleanup, break out
>> non-essential
>> > > > code, build a better non conflicting hive-exec jar
>> > > > [11:51:10] <noland> That sounds good. Once we hack on github for a
>> > while
>> > > > it'd be nice to develop a brief high level plan on how to
>>implement
>> > > > [11:51:26] <ecapriolo> Also get maven artifacts with correct
>> depencency
>> > > > scopes like provided etc
>> > > > [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is
>> like
>> > > > pulling in the world
>> > > > [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
>> > > >
>> > > >
>> > > > On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <
>> > edlinuxguru@gmail.com
>> > > >wrote:
>> > > >
>> > > >> I have opened
>> https://issues.apache.org/jira/browse/HIVE-5107because I
>> > > >> am growing tired of how long hive's build take.
>> > > >>
>> > > >> I have started playing with this by creating a simple
>>multi-module
>> > > >> project and copying stuff as I go. I have ported a minimal shims
>>and
>> > > common
>> > > >> and I have all the tests in common almost running.
>> > > >>
>> > > >> Q. This is going to be ugly hacky work for a while, I was
>>thinking
>> it
>> > > >> should be a branch but it is just going to be a mess of moves and
>> > copies
>> > > >> etc. Not really something you can diff etc.
>> > > >>
>> > > >> Is anyone else interested in working on this as well. If so I
>>think
>> we
>> > > >> can just setup a github and I can arrange for anyone to have
>>access
>> to
>> > > it.
>> > > >>
>> > > >> Thanks,
>> > > >> Edward
>> > > >>
>> > > >>
>> > > >> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <
>> > edlinuxguru@gmail.com
>> > > >wrote:
>> > > >>
>> > > >>> "Some of the hard part was that some of the test classes are in
>>the
>> > > wrong
>> > > >>> module that references classes in a later module."
>> > > >>>
>> > > >>> I think the modules will have to be able to reference each
>>other in
>> > > many
>> > > >>> cases. Serde and QL are tightly coupled. QL is really too large
>>and
>> > we
>> > > >>> should find a way to cut that up.
>> > > >>>
>> > > >>> Part of this problem is the q.tests
>> > > >>>
>> > > >>> I think one way to handle this is to only allow unit tests
>>inside
>> the
>> > > >>> module. I imagine running all the q tests would be done in a
>>final
>> > > module
>> > > >>> hive-qtest. Or possibly two final modules
>> > > >>> hive-qtest
>> > > >>> hive-qtest-extra (tangential things like UDFS and input formats
>>not
>> > > core
>> > > >>> to hive)
>> > > >>>
>> > > >>>
>> > > >>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley
>><omalley@apache.org
>> > > >wrote:
>> > > >>>
>> > > >>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
>> > > >>>> kulkarni.swarnim@gmail.com> wrote:
>> > > >>>>
>> > > >>>> > > I'd like to propose we move towards Maven.
>> > > >>>> >
>> > > >>>> > Big +1 on this. Most of the major apache projects(hadoop,
>>hbase,
>> > > avro
>> > > >>>> etc.)
>> > > >>>> > are maven based.
>> > > >>>> >
>> > > >>>>
>> > > >>>> A big +1 from me too. I actually took a pass at it a couple of
>> > months
>> > > >>>> ago.
>> > > >>>> Some of the hard part was that some of the test classes are in
>>the
>> > > wrong
>> > > >>>> module that references classes in a later module. Obviously
>>that
>> > > >>>> prevents
>> > > >>>> any kind of modular build.
>> > > >>>>
>> > > >>>> As an additional plus to Maven is that Maven includes tools to
>> > correct
>> > > >>>> the
>> > > >>>> project and module dependencies.
>> > > >>>>
>> > > >>>> -- Owen
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>

Re: [Discuss] project chop up

Posted by amareshwari sriramdasu <am...@gmail.com>.

Sounds great! Looking forward !


On Tue, Aug 20, 2013 at 7:58 PM, Edward Capriolo <ed...@gmail.com>wrote:

> Just an update. This is going very well:
>
> NFO] Nothing to compile - all classes are up to date
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hive ....................................... SUCCESS [0.002s]
> [INFO] hive-shims-x ...................................... SUCCESS [1.210s]
> [INFO] hive-shims-20 ..................................... SUCCESS [0.125s]
> [INFO] hive-common ....................................... SUCCESS [0.082s]
> [INFO] hive-serde ........................................ SUCCESS [2.521s]
> [INFO] hive-metastore .................................... SUCCESS
> [10.818s]
> [INFO] hive-exec ......................................... SUCCESS [4.521s]
> [INFO] hive-avro ......................................... SUCCESS [1.582s]
> [INFO] hive-zookeeper .................................... SUCCESS [0.519s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 21.613s
> [INFO] Finished at: Tue Aug 20 10:23:34 EDT 2013
> [INFO] Final Memory: 39M/408M
>
>
> Though I did some short cuts and disabled some tests. We can build hive
> very fast, including incremental builds. Also we are using maven plugins to
> compile antlr, thrift, protobuf, datanucleas and building those every time.
>
>
> On Fri, Aug 16, 2013 at 11:16 PM, Xuefu Zhang <xz...@cloudera.com> wrote:
>
> > Thanks, Edward.
> >
> > I'm big +1 to mavenize Hive. Hive has long reached a point where it's
> hard
> > to manage its build using ant. I'd like to help on this too.
> >
> > Thanks,
> > Xuefu
> >
> >
> > On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo <edlinuxguru@gmail.com
> > >wrote:
> >
> > > For those interested in pitching in.
> > > https://github.com/edwardcapriolo/hive
> > >
> > >
> > >
> > > On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <
> edlinuxguru@gmail.com
> > > >wrote:
> > >
> > > > Summary from hive-irc channel. Minor edits for spell check/grammar.
> > > >
> > > > The last 10 lines are a summary of the key points.
> > > >
> > > > [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive
> > in
> > > > maven?
> > > > [11:01:06] smonchi [~
> > > > ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit
> > IRC:
> > > > Quit: ... 'cause there is no patch for human stupidity ...
> > > > [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
> > > > [11:10:22] <noland> I saw you created the jira but haven't had time
> to
> > > look
> > > > [11:10:32] <ecapriolo> So I found a few things
> > > > [11:10:49] <ecapriolo> In common there is one or two testats that
> > > actually
> > > > fork a process :)
> > > > [11:10:56] <ecapriolo> and use build.test.resources
> > > > [11:11:12] <ecapriolo> Some serde, uses some methods from ql in
> testing
> > > > [11:11:27] <ecapriolo> and shims really needs a separate hadoop test
> > shim
> > > > [11:11:32] <ecapriolo> But that is all simple stuff
> > > > [11:11:47] <ecapriolo> The biggest problem is I do not know how to
> > solve
> > > > shims with maven
> > > > [11:11:50] <ecapriolo> do you have any ideas
> > > > [11:11:52] <ecapriolo> ?
> > > > [11:13:00] <noland> That one is going to be a challenge. It might be
> > that
> > > > in that section we have to drop down to ant
> > > > [11:14:44] <noland> Is it a requirement that we build both the .20
> and
> > > .23
> > > > shims for a "package" as we do today?
> > > > [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
> > > > [11:16:59] <ecapriolo> Se separate out the interface of shims
> > > > [11:17:22] <ecapriolo> And then at runtime we drop in a driver
> > > implementing
> > > > [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
> > host
> > > > closed the connection
> > > > [11:17:36] <ecapriolo> That or we could use maven's profile system
> > > > [11:18:09] <ecapriolo> It seems that everything else can actually
> link
> > > > against hadoop-0.20.2 as a provided dependency
> > > > [11:18:37] <noland> Yeah either would work. The driver method would
> > > > probably require use to use ant build both the drivers?
> > > > [11:18:44] <noland> I am a fan of mvn profiles
> > > > [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out
> > into
> > > > its own project,, not a module
> > > > [11:19:10] <ecapriolo> to achive that jdbc thing
> > > > [11:19:27] <ecapriolo> But I do not have a solution yet, I was
> looking
> > to
> > > > farm that out to someone smart...like you :)
> > > > [11:19:33] <noland> :)
> > > > [11:19:47] <ecapriolo> All I know is that we need a test shim because
> > > > HadoopShim requires hadoop-test jars
> > > > [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest
> anyway
> > > > [11:20:48] <ecapriolo> Is this something you want to help with? I was
> > > > thinking of spinning up a github
> > > > [11:20:50] <noland> I think that the separate projects would work and
> > > > perhaps nicely.
> > > > [11:21:01] <noland> Yeah I'd be interested in helping!
> > > > [11:21:17] <noland> But I am going on vacation starting next week for
> > > > about 10 days
> > > > [11:21:27] <ecapriolo> Ah cool where are you going?
> > > > [11:21:37] <noland> Netherlands
> > > > [11:21:42] <noland> Biking around and such
> > > > [11:23:52] <noland> The one thing I was thinking about with regards
> to
> > a
> > > > branch is keeping history. We'll want to keep history for the files
> but
> > > > AFAICT svn doesn't understand git mv.
> > > > [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
> > > > [11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com]
> > has
> > > > quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
> > > > [11:35:49] <ecapriolo> noland: Right I do not play to suggest that we
> > > will
> > > > do this in git
> > > > [11:36:11] <ecapriolo> I just see that we are going to have to hack
> > stuff
> > > > up and it is not the type of work that lends itself well to branches.
> > > > [11:36:17] <noland> Ahh ok
> > > > [11:36:56] <ecapriolo> Once we come up with a solution for the shims,
> > and
> > > > we have something that can reasonably build and test hive we can
> figure
> > > out
> > > > how to apply that to a branch/trunk
> > > > [11:36:58] <noland> yeah so just do a POC on github and then
> implement
> > on
> > > > svn
> > > > [11:37:05] <noland> cool
> > > > [11:37:29] <ecapriolo> Along the way we can probably find things that
> > we
> > > > can do like that common test I found and other minor things
> > > > [11:37:41] <noland> sounds good
> > > > [11:37:50] <ecapriolo> Those we can likely just commit into the
> current
> > > > trunk and I will file issues for those now
> > > > [11:37:58] <noland> cool
> > > > [11:38:41] <ecapriolo> But yea man. I just cant take the project as
> it
> > is
> > > > now
> > > > [11:38:51] <ecapriolo> in eclipse everytime I touch a file it
> rebuilds
> > > > everything!
> > > > [11:38:53] <ecapriolo> Its like WTF
> > > > [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
> > > > [11:39:12] <ecapriolo> its out of control
> > > > [11:39:23] <noland> LOL
> > > > [11:39:29] <noland> I agree 110%
> > > > [11:39:32] <ecapriolo> eclipse was not always like that I am not sure
> > how
> > > > the hell it happened
> > > > [11:39:51] <noland> The eclipse sep thing is so harmful
> > > > [11:40:08] <noland> dep thing that is
> > > > [11:40:12] <ecapriolo> I mean command line ant was always bad, but
> you
> > > > used to be able to work in eclipse without having to rebuild
> everything
> > > > every change/test
> > > > [11:40:39] <noland> Yeah the first thing I do these days is disable
> the
> > > > ant builder
> > > > [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
> > > > [11:40:55] <noland> it starts compiling while you are still working
> and
> > > > blocks for minutes
> > > > [11:41:02] <ecapriolo> Right that is what I mean
> > > > [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the
> > project
> > > > [11:41:14] <noland> yeah you can remove it in project…one sec
> > > > [11:41:17] <ecapriolo> perm gen
> > > > [11:41:20] <ecapriolo> ant builder
> > > > [11:41:32] <noland> project -> properties -> builders
> > > > [11:41:34] <ecapriolo> hive does not build offline anymore
> > > > [11:41:37] <noland> yeah
> > > > [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it
> has
> > > > gotten really really bad
> > > > [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
> > > > non-essentials
> > > > [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff
> to
> > > > support custom formats
> > > > [11:42:30] <ecapriolo> that is going into its own module
> > > > [11:42:43] <ecapriolo> Going to rip out all the udfs accept between
> and
> > > or.
> > > > [11:43:50] <noland> yeah it'd be nice to have those items in their
> own
> > > > modules so you can just build/test them when you want
> > > > [11:44:12] <ecapriolo> hbase zookeeper locking
> > > > [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
> > host
> > > > closed the connection
> > > > [11:44:44] <noland> yeah for sure
> > > > [11:45:04] <noland> I think the default for testing should be the in
> > > > process locking
> > > > [11:45:10] <ecapriolo> Absolutely.
> > > > [11:45:40] <ecapriolo> The other issue I want to tackle is
> > hive-exec.jar
> > > > [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
> > > > [11:46:46] <ecapriolo> I run into to many conflicts with log4j and
> > guava,
> > > > and commons-utils all those things need to be packaged into
> > > non-conflicting
> > > > packages
> > > > [11:46:58] <noland> I haven't looked at how we build that yet but I
> > agree
> > > > it'd be nice if we could jar-jar things like guava
> > > > [11:47:12] <noland> so we can actually use them on server side
> > > > [11:47:16] <ecapriolo> We dont really need quava. its probably just
> > used
> > > > for one tiny thing
> > > > [11:47:43] <ecapriolo> People are forgetting/do not understand that
> > > > hive-exec needs to get sent via the distributed cache
> > > > [11:47:57] <noland> Wen we implement range joins they have a RangeMap
> > > that
> > > > we'll need.
> > > > [11:47:57] <ecapriolo> so making it hulkingly fat just slows
> everything
> > > > down
> > > > [11:48:11] <noland> Do we ship it every time?
> > > > [11:48:25] <noland> Cause we only have to ship it once per version of
> > the
> > > > jar.
> > > > [11:48:42] <ecapriolo> Recently you need the jackson jars on the
> auxlib
> > > as
> > > > well
> > > > [11:48:46] <ecapriolo> hive will not work without it
> > > > [11:49:11] <ecapriolo> People are just focused
> > > > feature-feature-feature...bigger...bigger bigger
> > > > [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has
> > quit
> > > > IRC: Quit: Leaving
> > > > [11:49:27] <noland> yeah maven modules will definitely help us
> > understand
> > > > who depends on what.
> > > > [11:49:28] <ecapriolo> Next up kyro
> > > > [11:49:51] <noland> I agree there is a lot of tech debt that needs
> > paying
> > > > [11:50:30] <ecapriolo> So those are all the high level things I want
> to
> > > > tackle
> > > > [11:50:59] <ecapriolo> shims, general cleanup, break out
> non-essential
> > > > code, build a better non conflicting hive-exec jar
> > > > [11:51:10] <noland> That sounds good. Once we hack on github for a
> > while
> > > > it'd be nice to develop a brief high level plan on how to implement
> > > > [11:51:26] <ecapriolo> Also get maven artifacts with correct
> depencency
> > > > scopes like provided etc
> > > > [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is
> like
> > > > pulling in the world
> > > > [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
> > > >
> > > >
> > > > On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <
> > edlinuxguru@gmail.com
> > > >wrote:
> > > >
> > > >> I have opened
> https://issues.apache.org/jira/browse/HIVE-5107because I
> > > >> am growing tired of how long hive's build take.
> > > >>
> > > >> I have started playing with this by creating a simple multi-module
> > > >> project and copying stuff as I go. I have ported a minimal shims and
> > > common
> > > >> and I have all the tests in common almost running.
> > > >>
> > > >> Q. This is going to be ugly hacky work for a while, I was thinking
> it
> > > >> should be a branch but it is just going to be a mess of moves and
> > copies
> > > >> etc. Not really something you can diff etc.
> > > >>
> > > >> Is anyone else interested in working on this as well. If so I think
> we
> > > >> can just setup a github and I can arrange for anyone to have access
> to
> > > it.
> > > >>
> > > >> Thanks,
> > > >> Edward
> > > >>
> > > >>
> > > >> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <
> > edlinuxguru@gmail.com
> > > >wrote:
> > > >>
> > > >>> "Some of the hard part was that some of the test classes are in the
> > > wrong
> > > >>> module that references classes in a later module."
> > > >>>
> > > >>> I think the modules will have to be able to reference each other in
> > > many
> > > >>> cases. Serde and QL are tightly coupled. QL is really too large and
> > we
> > > >>> should find a way to cut that up.
> > > >>>
> > > >>> Part of this problem is the q.tests
> > > >>>
> > > >>> I think one way to handle this is to only allow unit tests inside
> the
> > > >>> module. I imagine running all the q tests would be done in a final
> > > module
> > > >>> hive-qtest. Or possibly two final modules
> > > >>> hive-qtest
> > > >>> hive-qtest-extra (tangential things like UDFS and input formats not
> > > core
> > > >>> to hive)
> > > >>>
> > > >>>
> > > >>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <omalley@apache.org
> > > >wrote:
> > > >>>
> > > >>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
> > > >>>> kulkarni.swarnim@gmail.com> wrote:
> > > >>>>
> > > >>>> > > I'd like to propose we move towards Maven.
> > > >>>> >
> > > >>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase,
> > > avro
> > > >>>> etc.)
> > > >>>> > are maven based.
> > > >>>> >
> > > >>>>
> > > >>>> A big +1 from me too. I actually took a pass at it a couple of
> > months
> > > >>>> ago.
> > > >>>> Some of the hard part was that some of the test classes are in the
> > > wrong
> > > >>>> module that references classes in a later module. Obviously that
> > > >>>> prevents
> > > >>>> any kind of modular build.
> > > >>>>
> > > >>>> As an additional plus to Maven is that Maven includes tools to
> > correct
> > > >>>> the
> > > >>>> project and module dependencies.
> > > >>>>
> > > >>>> -- Owen
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: [Discuss] project chop up

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

This is awesome!

Thanks,
+Vinod

On Aug 20, 2013, at 7:28 AM, Edward Capriolo wrote:

> Just an update. This is going very well:
> 
> NFO] Nothing to compile - all classes are up to date
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hive ....................................... SUCCESS [0.002s]
> [INFO] hive-shims-x ...................................... SUCCESS [1.210s]
> [INFO] hive-shims-20 ..................................... SUCCESS [0.125s]
> [INFO] hive-common ....................................... SUCCESS [0.082s]
> [INFO] hive-serde ........................................ SUCCESS [2.521s]
> [INFO] hive-metastore .................................... SUCCESS [10.818s]
> [INFO] hive-exec ......................................... SUCCESS [4.521s]
> [INFO] hive-avro ......................................... SUCCESS [1.582s]
> [INFO] hive-zookeeper .................................... SUCCESS [0.519s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 21.613s
> [INFO] Finished at: Tue Aug 20 10:23:34 EDT 2013
> [INFO] Final Memory: 39M/408M
> 
> 
> Though I did some short cuts and disabled some tests. We can build hive
> very fast, including incremental builds. Also we are using maven plugins to
> compile antlr, thrift, protobuf, datanucleas and building those every time.
> 
> 
> On Fri, Aug 16, 2013 at 11:16 PM, Xuefu Zhang <xz...@cloudera.com> wrote:
> 
>> Thanks, Edward.
>> 
>> I'm big +1 to mavenize Hive. Hive has long reached a point where it's hard
>> to manage its build using ant. I'd like to help on this too.
>> 
>> Thanks,
>> Xuefu
>> 
>> 
>> On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo <edlinuxguru@gmail.com
>>> wrote:
>> 
>>> For those interested in pitching in.
>>> https://github.com/edwardcapriolo/hive
>>> 
>>> 
>>> 
>>> On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <edlinuxguru@gmail.com
>>>> wrote:
>>> 
>>>> Summary from hive-irc channel. Minor edits for spell check/grammar.
>>>> 
>>>> The last 10 lines are a summary of the key points.
>>>> 
>>>> [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive
>> in
>>>> maven?
>>>> [11:01:06] smonchi [~
>>>> ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit
>> IRC:
>>>> Quit: ... 'cause there is no patch for human stupidity ...
>>>> [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
>>>> [11:10:22] <noland> I saw you created the jira but haven't had time to
>>> look
>>>> [11:10:32] <ecapriolo> So I found a few things
>>>> [11:10:49] <ecapriolo> In common there is one or two testats that
>>> actually
>>>> fork a process :)
>>>> [11:10:56] <ecapriolo> and use build.test.resources
>>>> [11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
>>>> [11:11:27] <ecapriolo> and shims really needs a separate hadoop test
>> shim
>>>> [11:11:32] <ecapriolo> But that is all simple stuff
>>>> [11:11:47] <ecapriolo> The biggest problem is I do not know how to
>> solve
>>>> shims with maven
>>>> [11:11:50] <ecapriolo> do you have any ideas
>>>> [11:11:52] <ecapriolo> ?
>>>> [11:13:00] <noland> That one is going to be a challenge. It might be
>> that
>>>> in that section we have to drop down to ant
>>>> [11:14:44] <noland> Is it a requirement that we build both the .20 and
>>> .23
>>>> shims for a "package" as we do today?
>>>> [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
>>>> [11:16:59] <ecapriolo> Se separate out the interface of shims
>>>> [11:17:22] <ecapriolo> And then at runtime we drop in a driver
>>> implementing
>>>> [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
>> host
>>>> closed the connection
>>>> [11:17:36] <ecapriolo> That or we could use maven's profile system
>>>> [11:18:09] <ecapriolo> It seems that everything else can actually link
>>>> against hadoop-0.20.2 as a provided dependency
>>>> [11:18:37] <noland> Yeah either would work. The driver method would
>>>> probably require use to use ant build both the drivers?
>>>> [11:18:44] <noland> I am a fan of mvn profiles
>>>> [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out
>> into
>>>> its own project,, not a module
>>>> [11:19:10] <ecapriolo> to achive that jdbc thing
>>>> [11:19:27] <ecapriolo> But I do not have a solution yet, I was looking
>> to
>>>> farm that out to someone smart...like you :)
>>>> [11:19:33] <noland> :)
>>>> [11:19:47] <ecapriolo> All I know is that we need a test shim because
>>>> HadoopShim requires hadoop-test jars
>>>> [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
>>>> [11:20:48] <ecapriolo> Is this something you want to help with? I was
>>>> thinking of spinning up a github
>>>> [11:20:50] <noland> I think that the separate projects would work and
>>>> perhaps nicely.
>>>> [11:21:01] <noland> Yeah I'd be interested in helping!
>>>> [11:21:17] <noland> But I am going on vacation starting next week for
>>>> about 10 days
>>>> [11:21:27] <ecapriolo> Ah cool where are you going?
>>>> [11:21:37] <noland> Netherlands
>>>> [11:21:42] <noland> Biking around and such
>>>> [11:23:52] <noland> The one thing I was thinking about with regards to
>> a
>>>> branch is keeping history. We'll want to keep history for the files but
>>>> AFAICT svn doesn't understand git mv.
>>>> [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
>>>> [11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com]
>> has
>>>> quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
>>>> [11:35:49] <ecapriolo> noland: Right I do not play to suggest that we
>>> will
>>>> do this in git
>>>> [11:36:11] <ecapriolo> I just see that we are going to have to hack
>> stuff
>>>> up and it is not the type of work that lends itself well to branches.
>>>> [11:36:17] <noland> Ahh ok
>>>> [11:36:56] <ecapriolo> Once we come up with a solution for the shims,
>> and
>>>> we have something that can reasonably build and test hive we can figure
>>> out
>>>> how to apply that to a branch/trunk
>>>> [11:36:58] <noland> yeah so just do a POC on github and then implement
>> on
>>>> svn
>>>> [11:37:05] <noland> cool
>>>> [11:37:29] <ecapriolo> Along the way we can probably find things that
>> we
>>>> can do like that common test I found and other minor things
>>>> [11:37:41] <noland> sounds good
>>>> [11:37:50] <ecapriolo> Those we can likely just commit into the current
>>>> trunk and I will file issues for those now
>>>> [11:37:58] <noland> cool
>>>> [11:38:41] <ecapriolo> But yea man. I just cant take the project as it
>> is
>>>> now
>>>> [11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
>>>> everything!
>>>> [11:38:53] <ecapriolo> Its like WTF
>>>> [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
>>>> [11:39:12] <ecapriolo> its out of control
>>>> [11:39:23] <noland> LOL
>>>> [11:39:29] <noland> I agree 110%
>>>> [11:39:32] <ecapriolo> eclipse was not always like that I am not sure
>> how
>>>> the hell it happened
>>>> [11:39:51] <noland> The eclipse sep thing is so harmful
>>>> [11:40:08] <noland> dep thing that is
>>>> [11:40:12] <ecapriolo> I mean command line ant was always bad, but you
>>>> used to be able to work in eclipse without having to rebuild everything
>>>> every change/test
>>>> [11:40:39] <noland> Yeah the first thing I do these days is disable the
>>>> ant builder
>>>> [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
>>>> [11:40:55] <noland> it starts compiling while you are still working and
>>>> blocks for minutes
>>>> [11:41:02] <ecapriolo> Right that is what I mean
>>>> [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the
>> project
>>>> [11:41:14] <noland> yeah you can remove it in project…one sec
>>>> [11:41:17] <ecapriolo> perm gen
>>>> [11:41:20] <ecapriolo> ant builder
>>>> [11:41:32] <noland> project -> properties -> builders
>>>> [11:41:34] <ecapriolo> hive does not build offline anymore
>>>> [11:41:37] <noland> yeah
>>>> [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it has
>>>> gotten really really bad
>>>> [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
>>>> non-essentials
>>>> [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff to
>>>> support custom formats
>>>> [11:42:30] <ecapriolo> that is going into its own module
>>>> [11:42:43] <ecapriolo> Going to rip out all the udfs accept between and
>>> or.
>>>> [11:43:50] <noland> yeah it'd be nice to have those items in their own
>>>> modules so you can just build/test them when you want
>>>> [11:44:12] <ecapriolo> hbase zookeeper locking
>>>> [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
>> host
>>>> closed the connection
>>>> [11:44:44] <noland> yeah for sure
>>>> [11:45:04] <noland> I think the default for testing should be the in
>>>> process locking
>>>> [11:45:10] <ecapriolo> Absolutely.
>>>> [11:45:40] <ecapriolo> The other issue I want to tackle is
>> hive-exec.jar
>>>> [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
>>>> [11:46:46] <ecapriolo> I run into to many conflicts with log4j and
>> guava,
>>>> and commons-utils all those things need to be packaged into
>>> non-conflicting
>>>> packages
>>>> [11:46:58] <noland> I haven't looked at how we build that yet but I
>> agree
>>>> it'd be nice if we could jar-jar things like guava
>>>> [11:47:12] <noland> so we can actually use them on server side
>>>> [11:47:16] <ecapriolo> We dont really need quava. its probably just
>> used
>>>> for one tiny thing
>>>> [11:47:43] <ecapriolo> People are forgetting/do not understand that
>>>> hive-exec needs to get sent via the distributed cache
>>>> [11:47:57] <noland> Wen we implement range joins they have a RangeMap
>>> that
>>>> we'll need.
>>>> [11:47:57] <ecapriolo> so making it hulkingly fat just slows everything
>>>> down
>>>> [11:48:11] <noland> Do we ship it every time?
>>>> [11:48:25] <noland> Cause we only have to ship it once per version of
>> the
>>>> jar.
>>>> [11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib
>>> as
>>>> well
>>>> [11:48:46] <ecapriolo> hive will not work without it
>>>> [11:49:11] <ecapriolo> People are just focused
>>>> feature-feature-feature...bigger...bigger bigger
>>>> [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has
>> quit
>>>> IRC: Quit: Leaving
>>>> [11:49:27] <noland> yeah maven modules will definitely help us
>> understand
>>>> who depends on what.
>>>> [11:49:28] <ecapriolo> Next up kyro
>>>> [11:49:51] <noland> I agree there is a lot of tech debt that needs
>> paying
>>>> [11:50:30] <ecapriolo> So those are all the high level things I want to
>>>> tackle
>>>> [11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
>>>> code, build a better non conflicting hive-exec jar
>>>> [11:51:10] <noland> That sounds good. Once we hack on github for a
>> while
>>>> it'd be nice to develop a brief high level plan on how to implement
>>>> [11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
>>>> scopes like provided etc
>>>> [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
>>>> pulling in the world
>>>> [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
>>>> 
>>>> 
>>>> On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <
>> edlinuxguru@gmail.com
>>>> wrote:
>>>> 
>>>>> I have opened https://issues.apache.org/jira/browse/HIVE-5107because I
>>>>> am growing tired of how long hive's build take.
>>>>> 
>>>>> I have started playing with this by creating a simple multi-module
>>>>> project and copying stuff as I go. I have ported a minimal shims and
>>> common
>>>>> and I have all the tests in common almost running.
>>>>> 
>>>>> Q. This is going to be ugly hacky work for a while, I was thinking it
>>>>> should be a branch but it is just going to be a mess of moves and
>> copies
>>>>> etc. Not really something you can diff etc.
>>>>> 
>>>>> Is anyone else interested in working on this as well. If so I think we
>>>>> can just setup a github and I can arrange for anyone to have access to
>>> it.
>>>>> 
>>>>> Thanks,
>>>>> Edward
>>>>> 
>>>>> 
>>>>> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <
>> edlinuxguru@gmail.com
>>>> wrote:
>>>>> 
>>>>>> "Some of the hard part was that some of the test classes are in the
>>> wrong
>>>>>> module that references classes in a later module."
>>>>>> 
>>>>>> I think the modules will have to be able to reference each other in
>>> many
>>>>>> cases. Serde and QL are tightly coupled. QL is really too large and
>> we
>>>>>> should find a way to cut that up.
>>>>>> 
>>>>>> Part of this problem is the q.tests
>>>>>> 
>>>>>> I think one way to handle this is to only allow unit tests inside the
>>>>>> module. I imagine running all the q tests would be done in a final
>>> module
>>>>>> hive-qtest. Or possibly two final modules
>>>>>> hive-qtest
>>>>>> hive-qtest-extra (tangential things like UDFS and input formats not
>>> core
>>>>>> to hive)
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <omalley@apache.org
>>>> wrote:
>>>>>> 
>>>>>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
>>>>>>> kulkarni.swarnim@gmail.com> wrote:
>>>>>>> 
>>>>>>>>> I'd like to propose we move towards Maven.
>>>>>>>> 
>>>>>>>> Big +1 on this. Most of the major apache projects(hadoop, hbase,
>>> avro
>>>>>>> etc.)
>>>>>>>> are maven based.
>>>>>>>> 
>>>>>>> 
>>>>>>> A big +1 from me too. I actually took a pass at it a couple of
>> months
>>>>>>> ago.
>>>>>>> Some of the hard part was that some of the test classes are in the
>>> wrong
>>>>>>> module that references classes in a later module. Obviously that
>>>>>>> prevents
>>>>>>> any kind of modular build.
>>>>>>> 
>>>>>>> As an additional plus to Maven is that Maven includes tools to
>> correct
>>>>>>> the
>>>>>>> project and module dependencies.
>>>>>>> 
>>>>>>> -- Owen
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [Discuss] project chop up

Posted by Edward Capriolo <ed...@gmail.com>.

Just an update. This is going very well:

NFO] Nothing to compile - all classes are up to date
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hive ....................................... SUCCESS [0.002s]
[INFO] hive-shims-x ...................................... SUCCESS [1.210s]
[INFO] hive-shims-20 ..................................... SUCCESS [0.125s]
[INFO] hive-common ....................................... SUCCESS [0.082s]
[INFO] hive-serde ........................................ SUCCESS [2.521s]
[INFO] hive-metastore .................................... SUCCESS [10.818s]
[INFO] hive-exec ......................................... SUCCESS [4.521s]
[INFO] hive-avro ......................................... SUCCESS [1.582s]
[INFO] hive-zookeeper .................................... SUCCESS [0.519s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 21.613s
[INFO] Finished at: Tue Aug 20 10:23:34 EDT 2013
[INFO] Final Memory: 39M/408M


Though I did some short cuts and disabled some tests. We can build hive
very fast, including incremental builds. Also we are using maven plugins to
compile antlr, thrift, protobuf, datanucleas and building those every time.


On Fri, Aug 16, 2013 at 11:16 PM, Xuefu Zhang <xz...@cloudera.com> wrote:

> Thanks, Edward.
>
> I'm big +1 to mavenize Hive. Hive has long reached a point where it's hard
> to manage its build using ant. I'd like to help on this too.
>
> Thanks,
> Xuefu
>
>
> On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
>
> > For those interested in pitching in.
> > https://github.com/edwardcapriolo/hive
> >
> >
> >
> > On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <edlinuxguru@gmail.com
> > >wrote:
> >
> > > Summary from hive-irc channel. Minor edits for spell check/grammar.
> > >
> > > The last 10 lines are a summary of the key points.
> > >
> > > [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive
> in
> > > maven?
> > > [11:01:06] smonchi [~
> > > ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit
> IRC:
> > > Quit: ... 'cause there is no patch for human stupidity ...
> > > [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
> > > [11:10:22] <noland> I saw you created the jira but haven't had time to
> > look
> > > [11:10:32] <ecapriolo> So I found a few things
> > > [11:10:49] <ecapriolo> In common there is one or two testats that
> > actually
> > > fork a process :)
> > > [11:10:56] <ecapriolo> and use build.test.resources
> > > [11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
> > > [11:11:27] <ecapriolo> and shims really needs a separate hadoop test
> shim
> > > [11:11:32] <ecapriolo> But that is all simple stuff
> > > [11:11:47] <ecapriolo> The biggest problem is I do not know how to
> solve
> > > shims with maven
> > > [11:11:50] <ecapriolo> do you have any ideas
> > > [11:11:52] <ecapriolo> ?
> > > [11:13:00] <noland> That one is going to be a challenge. It might be
> that
> > > in that section we have to drop down to ant
> > > [11:14:44] <noland> Is it a requirement that we build both the .20 and
> > .23
> > > shims for a "package" as we do today?
> > > [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
> > > [11:16:59] <ecapriolo> Se separate out the interface of shims
> > > [11:17:22] <ecapriolo> And then at runtime we drop in a driver
> > implementing
> > > [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
> host
> > > closed the connection
> > > [11:17:36] <ecapriolo> That or we could use maven's profile system
> > > [11:18:09] <ecapriolo> It seems that everything else can actually link
> > > against hadoop-0.20.2 as a provided dependency
> > > [11:18:37] <noland> Yeah either would work. The driver method would
> > > probably require use to use ant build both the drivers?
> > > [11:18:44] <noland> I am a fan of mvn profiles
> > > [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out
> into
> > > its own project,, not a module
> > > [11:19:10] <ecapriolo> to achive that jdbc thing
> > > [11:19:27] <ecapriolo> But I do not have a solution yet, I was looking
> to
> > > farm that out to someone smart...like you :)
> > > [11:19:33] <noland> :)
> > > [11:19:47] <ecapriolo> All I know is that we need a test shim because
> > > HadoopShim requires hadoop-test jars
> > > [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
> > > [11:20:48] <ecapriolo> Is this something you want to help with? I was
> > > thinking of spinning up a github
> > > [11:20:50] <noland> I think that the separate projects would work and
> > > perhaps nicely.
> > > [11:21:01] <noland> Yeah I'd be interested in helping!
> > > [11:21:17] <noland> But I am going on vacation starting next week for
> > > about 10 days
> > > [11:21:27] <ecapriolo> Ah cool where are you going?
> > > [11:21:37] <noland> Netherlands
> > > [11:21:42] <noland> Biking around and such
> > > [11:23:52] <noland> The one thing I was thinking about with regards to
> a
> > > branch is keeping history. We'll want to keep history for the files but
> > > AFAICT svn doesn't understand git mv.
> > > [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
> > > [11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com]
> has
> > > quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
> > > [11:35:49] <ecapriolo> noland: Right I do not play to suggest that we
> > will
> > > do this in git
> > > [11:36:11] <ecapriolo> I just see that we are going to have to hack
> stuff
> > > up and it is not the type of work that lends itself well to branches.
> > > [11:36:17] <noland> Ahh ok
> > > [11:36:56] <ecapriolo> Once we come up with a solution for the shims,
> and
> > > we have something that can reasonably build and test hive we can figure
> > out
> > > how to apply that to a branch/trunk
> > > [11:36:58] <noland> yeah so just do a POC on github and then implement
> on
> > > svn
> > > [11:37:05] <noland> cool
> > > [11:37:29] <ecapriolo> Along the way we can probably find things that
> we
> > > can do like that common test I found and other minor things
> > > [11:37:41] <noland> sounds good
> > > [11:37:50] <ecapriolo> Those we can likely just commit into the current
> > > trunk and I will file issues for those now
> > > [11:37:58] <noland> cool
> > > [11:38:41] <ecapriolo> But yea man. I just cant take the project as it
> is
> > > now
> > > [11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
> > > everything!
> > > [11:38:53] <ecapriolo> Its like WTF
> > > [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
> > > [11:39:12] <ecapriolo> its out of control
> > > [11:39:23] <noland> LOL
> > > [11:39:29] <noland> I agree 110%
> > > [11:39:32] <ecapriolo> eclipse was not always like that I am not sure
> how
> > > the hell it happened
> > > [11:39:51] <noland> The eclipse sep thing is so harmful
> > > [11:40:08] <noland> dep thing that is
> > > [11:40:12] <ecapriolo> I mean command line ant was always bad, but you
> > > used to be able to work in eclipse without having to rebuild everything
> > > every change/test
> > > [11:40:39] <noland> Yeah the first thing I do these days is disable the
> > > ant builder
> > > [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
> > > [11:40:55] <noland> it starts compiling while you are still working and
> > > blocks for minutes
> > > [11:41:02] <ecapriolo> Right that is what I mean
> > > [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the
> project
> > > [11:41:14] <noland> yeah you can remove it in project…one sec
> > > [11:41:17] <ecapriolo> perm gen
> > > [11:41:20] <ecapriolo> ant builder
> > > [11:41:32] <noland> project -> properties -> builders
> > > [11:41:34] <ecapriolo> hive does not build offline anymore
> > > [11:41:37] <noland> yeah
> > > [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it has
> > > gotten really really bad
> > > [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
> > > non-essentials
> > > [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff to
> > > support custom formats
> > > [11:42:30] <ecapriolo> that is going into its own module
> > > [11:42:43] <ecapriolo> Going to rip out all the udfs accept between and
> > or.
> > > [11:43:50] <noland> yeah it'd be nice to have those items in their own
> > > modules so you can just build/test them when you want
> > > [11:44:12] <ecapriolo> hbase zookeeper locking
> > > [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote
> host
> > > closed the connection
> > > [11:44:44] <noland> yeah for sure
> > > [11:45:04] <noland> I think the default for testing should be the in
> > > process locking
> > > [11:45:10] <ecapriolo> Absolutely.
> > > [11:45:40] <ecapriolo> The other issue I want to tackle is
> hive-exec.jar
> > > [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
> > > [11:46:46] <ecapriolo> I run into to many conflicts with log4j and
> guava,
> > > and commons-utils all those things need to be packaged into
> > non-conflicting
> > > packages
> > > [11:46:58] <noland> I haven't looked at how we build that yet but I
> agree
> > > it'd be nice if we could jar-jar things like guava
> > > [11:47:12] <noland> so we can actually use them on server side
> > > [11:47:16] <ecapriolo> We dont really need quava. its probably just
> used
> > > for one tiny thing
> > > [11:47:43] <ecapriolo> People are forgetting/do not understand that
> > > hive-exec needs to get sent via the distributed cache
> > > [11:47:57] <noland> Wen we implement range joins they have a RangeMap
> > that
> > > we'll need.
> > > [11:47:57] <ecapriolo> so making it hulkingly fat just slows everything
> > > down
> > > [11:48:11] <noland> Do we ship it every time?
> > > [11:48:25] <noland> Cause we only have to ship it once per version of
> the
> > > jar.
> > > [11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib
> > as
> > > well
> > > [11:48:46] <ecapriolo> hive will not work without it
> > > [11:49:11] <ecapriolo> People are just focused
> > > feature-feature-feature...bigger...bigger bigger
> > > [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has
> quit
> > > IRC: Quit: Leaving
> > > [11:49:27] <noland> yeah maven modules will definitely help us
> understand
> > > who depends on what.
> > > [11:49:28] <ecapriolo> Next up kyro
> > > [11:49:51] <noland> I agree there is a lot of tech debt that needs
> paying
> > > [11:50:30] <ecapriolo> So those are all the high level things I want to
> > > tackle
> > > [11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
> > > code, build a better non conflicting hive-exec jar
> > > [11:51:10] <noland> That sounds good. Once we hack on github for a
> while
> > > it'd be nice to develop a brief high level plan on how to implement
> > > [11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
> > > scopes like provided etc
> > > [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
> > > pulling in the world
> > > [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
> > >
> > >
> > > On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <
> edlinuxguru@gmail.com
> > >wrote:
> > >
> > >> I have opened https://issues.apache.org/jira/browse/HIVE-5107because I
> > >> am growing tired of how long hive's build take.
> > >>
> > >> I have started playing with this by creating a simple multi-module
> > >> project and copying stuff as I go. I have ported a minimal shims and
> > common
> > >> and I have all the tests in common almost running.
> > >>
> > >> Q. This is going to be ugly hacky work for a while, I was thinking it
> > >> should be a branch but it is just going to be a mess of moves and
> copies
> > >> etc. Not really something you can diff etc.
> > >>
> > >> Is anyone else interested in working on this as well. If so I think we
> > >> can just setup a github and I can arrange for anyone to have access to
> > it.
> > >>
> > >> Thanks,
> > >> Edward
> > >>
> > >>
> > >> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <
> edlinuxguru@gmail.com
> > >wrote:
> > >>
> > >>> "Some of the hard part was that some of the test classes are in the
> > wrong
> > >>> module that references classes in a later module."
> > >>>
> > >>> I think the modules will have to be able to reference each other in
> > many
> > >>> cases. Serde and QL are tightly coupled. QL is really too large and
> we
> > >>> should find a way to cut that up.
> > >>>
> > >>> Part of this problem is the q.tests
> > >>>
> > >>> I think one way to handle this is to only allow unit tests inside the
> > >>> module. I imagine running all the q tests would be done in a final
> > module
> > >>> hive-qtest. Or possibly two final modules
> > >>> hive-qtest
> > >>> hive-qtest-extra (tangential things like UDFS and input formats not
> > core
> > >>> to hive)
> > >>>
> > >>>
> > >>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <omalley@apache.org
> > >wrote:
> > >>>
> > >>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
> > >>>> kulkarni.swarnim@gmail.com> wrote:
> > >>>>
> > >>>> > > I'd like to propose we move towards Maven.
> > >>>> >
> > >>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase,
> > avro
> > >>>> etc.)
> > >>>> > are maven based.
> > >>>> >
> > >>>>
> > >>>> A big +1 from me too. I actually took a pass at it a couple of
> months
> > >>>> ago.
> > >>>> Some of the hard part was that some of the test classes are in the
> > wrong
> > >>>> module that references classes in a later module. Obviously that
> > >>>> prevents
> > >>>> any kind of modular build.
> > >>>>
> > >>>> As an additional plus to Maven is that Maven includes tools to
> correct
> > >>>> the
> > >>>> project and module dependencies.
> > >>>>
> > >>>> -- Owen
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

Re: [Discuss] project chop up

Posted by Xuefu Zhang <xz...@cloudera.com>.

Thanks, Edward.

I'm big +1 to mavenize Hive. Hive has long reached a point where it's hard
to manage its build using ant. I'd like to help on this too.

Thanks,
Xuefu


On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo <ed...@gmail.com>wrote:

> For those interested in pitching in.
> https://github.com/edwardcapriolo/hive
>
>
>
> On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
>
> > Summary from hive-irc channel. Minor edits for spell check/grammar.
> >
> > The last 10 lines are a summary of the key points.
> >
> > [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive in
> > maven?
> > [11:01:06] smonchi [~
> > ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit IRC:
> > Quit: ... 'cause there is no patch for human stupidity ...
> > [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
> > [11:10:22] <noland> I saw you created the jira but haven't had time to
> look
> > [11:10:32] <ecapriolo> So I found a few things
> > [11:10:49] <ecapriolo> In common there is one or two testats that
> actually
> > fork a process :)
> > [11:10:56] <ecapriolo> and use build.test.resources
> > [11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
> > [11:11:27] <ecapriolo> and shims really needs a separate hadoop test shim
> > [11:11:32] <ecapriolo> But that is all simple stuff
> > [11:11:47] <ecapriolo> The biggest problem is I do not know how to solve
> > shims with maven
> > [11:11:50] <ecapriolo> do you have any ideas
> > [11:11:52] <ecapriolo> ?
> > [11:13:00] <noland> That one is going to be a challenge. It might be that
> > in that section we have to drop down to ant
> > [11:14:44] <noland> Is it a requirement that we build both the .20 and
> .23
> > shims for a "package" as we do today?
> > [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
> > [11:16:59] <ecapriolo> Se separate out the interface of shims
> > [11:17:22] <ecapriolo> And then at runtime we drop in a driver
> implementing
> > [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote host
> > closed the connection
> > [11:17:36] <ecapriolo> That or we could use maven's profile system
> > [11:18:09] <ecapriolo> It seems that everything else can actually link
> > against hadoop-0.20.2 as a provided dependency
> > [11:18:37] <noland> Yeah either would work. The driver method would
> > probably require use to use ant build both the drivers?
> > [11:18:44] <noland> I am a fan of mvn profiles
> > [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out into
> > its own project,, not a module
> > [11:19:10] <ecapriolo> to achive that jdbc thing
> > [11:19:27] <ecapriolo> But I do not have a solution yet, I was looking to
> > farm that out to someone smart...like you :)
> > [11:19:33] <noland> :)
> > [11:19:47] <ecapriolo> All I know is that we need a test shim because
> > HadoopShim requires hadoop-test jars
> > [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
> > [11:20:48] <ecapriolo> Is this something you want to help with? I was
> > thinking of spinning up a github
> > [11:20:50] <noland> I think that the separate projects would work and
> > perhaps nicely.
> > [11:21:01] <noland> Yeah I'd be interested in helping!
> > [11:21:17] <noland> But I am going on vacation starting next week for
> > about 10 days
> > [11:21:27] <ecapriolo> Ah cool where are you going?
> > [11:21:37] <noland> Netherlands
> > [11:21:42] <noland> Biking around and such
> > [11:23:52] <noland> The one thing I was thinking about with regards to a
> > branch is keeping history. We'll want to keep history for the files but
> > AFAICT svn doesn't understand git mv.
> > [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
> > [11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com] has
> > quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
> > [11:35:49] <ecapriolo> noland: Right I do not play to suggest that we
> will
> > do this in git
> > [11:36:11] <ecapriolo> I just see that we are going to have to hack stuff
> > up and it is not the type of work that lends itself well to branches.
> > [11:36:17] <noland> Ahh ok
> > [11:36:56] <ecapriolo> Once we come up with a solution for the shims, and
> > we have something that can reasonably build and test hive we can figure
> out
> > how to apply that to a branch/trunk
> > [11:36:58] <noland> yeah so just do a POC on github and then implement on
> > svn
> > [11:37:05] <noland> cool
> > [11:37:29] <ecapriolo> Along the way we can probably find things that we
> > can do like that common test I found and other minor things
> > [11:37:41] <noland> sounds good
> > [11:37:50] <ecapriolo> Those we can likely just commit into the current
> > trunk and I will file issues for those now
> > [11:37:58] <noland> cool
> > [11:38:41] <ecapriolo> But yea man. I just cant take the project as it is
> > now
> > [11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
> > everything!
> > [11:38:53] <ecapriolo> Its like WTF
> > [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
> > [11:39:12] <ecapriolo> its out of control
> > [11:39:23] <noland> LOL
> > [11:39:29] <noland> I agree 110%
> > [11:39:32] <ecapriolo> eclipse was not always like that I am not sure how
> > the hell it happened
> > [11:39:51] <noland> The eclipse sep thing is so harmful
> > [11:40:08] <noland> dep thing that is
> > [11:40:12] <ecapriolo> I mean command line ant was always bad, but you
> > used to be able to work in eclipse without having to rebuild everything
> > every change/test
> > [11:40:39] <noland> Yeah the first thing I do these days is disable the
> > ant builder
> > [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
> > [11:40:55] <noland> it starts compiling while you are still working and
> > blocks for minutes
> > [11:41:02] <ecapriolo> Right that is what I mean
> > [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the project
> > [11:41:14] <noland> yeah you can remove it in project…one sec
> > [11:41:17] <ecapriolo> perm gen
> > [11:41:20] <ecapriolo> ant builder
> > [11:41:32] <noland> project -> properties -> builders
> > [11:41:34] <ecapriolo> hive does not build offline anymore
> > [11:41:37] <noland> yeah
> > [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it has
> > gotten really really bad
> > [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
> > non-essentials
> > [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff to
> > support custom formats
> > [11:42:30] <ecapriolo> that is going into its own module
> > [11:42:43] <ecapriolo> Going to rip out all the udfs accept between and
> or.
> > [11:43:50] <noland> yeah it'd be nice to have those items in their own
> > modules so you can just build/test them when you want
> > [11:44:12] <ecapriolo> hbase zookeeper locking
> > [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote host
> > closed the connection
> > [11:44:44] <noland> yeah for sure
> > [11:45:04] <noland> I think the default for testing should be the in
> > process locking
> > [11:45:10] <ecapriolo> Absolutely.
> > [11:45:40] <ecapriolo> The other issue I want to tackle is hive-exec.jar
> > [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
> > [11:46:46] <ecapriolo> I run into to many conflicts with log4j and guava,
> > and commons-utils all those things need to be packaged into
> non-conflicting
> > packages
> > [11:46:58] <noland> I haven't looked at how we build that yet but I agree
> > it'd be nice if we could jar-jar things like guava
> > [11:47:12] <noland> so we can actually use them on server side
> > [11:47:16] <ecapriolo> We dont really need quava. its probably just used
> > for one tiny thing
> > [11:47:43] <ecapriolo> People are forgetting/do not understand that
> > hive-exec needs to get sent via the distributed cache
> > [11:47:57] <noland> Wen we implement range joins they have a RangeMap
> that
> > we'll need.
> > [11:47:57] <ecapriolo> so making it hulkingly fat just slows everything
> > down
> > [11:48:11] <noland> Do we ship it every time?
> > [11:48:25] <noland> Cause we only have to ship it once per version of the
> > jar.
> > [11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib
> as
> > well
> > [11:48:46] <ecapriolo> hive will not work without it
> > [11:49:11] <ecapriolo> People are just focused
> > feature-feature-feature...bigger...bigger bigger
> > [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has quit
> > IRC: Quit: Leaving
> > [11:49:27] <noland> yeah maven modules will definitely help us understand
> > who depends on what.
> > [11:49:28] <ecapriolo> Next up kyro
> > [11:49:51] <noland> I agree there is a lot of tech debt that needs paying
> > [11:50:30] <ecapriolo> So those are all the high level things I want to
> > tackle
> > [11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
> > code, build a better non conflicting hive-exec jar
> > [11:51:10] <noland> That sounds good. Once we hack on github for a while
> > it'd be nice to develop a brief high level plan on how to implement
> > [11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
> > scopes like provided etc
> > [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
> > pulling in the world
> > [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
> >
> >
> > On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
> >
> >> I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I
> >> am growing tired of how long hive's build take.
> >>
> >> I have started playing with this by creating a simple multi-module
> >> project and copying stuff as I go. I have ported a minimal shims and
> common
> >> and I have all the tests in common almost running.
> >>
> >> Q. This is going to be ugly hacky work for a while, I was thinking it
> >> should be a branch but it is just going to be a mess of moves and copies
> >> etc. Not really something you can diff etc.
> >>
> >> Is anyone else interested in working on this as well. If so I think we
> >> can just setup a github and I can arrange for anyone to have access to
> it.
> >>
> >> Thanks,
> >> Edward
> >>
> >>
> >> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
> >>
> >>> "Some of the hard part was that some of the test classes are in the
> wrong
> >>> module that references classes in a later module."
> >>>
> >>> I think the modules will have to be able to reference each other in
> many
> >>> cases. Serde and QL are tightly coupled. QL is really too large and we
> >>> should find a way to cut that up.
> >>>
> >>> Part of this problem is the q.tests
> >>>
> >>> I think one way to handle this is to only allow unit tests inside the
> >>> module. I imagine running all the q tests would be done in a final
> module
> >>> hive-qtest. Or possibly two final modules
> >>> hive-qtest
> >>> hive-qtest-extra (tangential things like UDFS and input formats not
> core
> >>> to hive)
> >>>
> >>>
> >>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <omalley@apache.org
> >wrote:
> >>>
> >>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
> >>>> kulkarni.swarnim@gmail.com> wrote:
> >>>>
> >>>> > > I'd like to propose we move towards Maven.
> >>>> >
> >>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase,
> avro
> >>>> etc.)
> >>>> > are maven based.
> >>>> >
> >>>>
> >>>> A big +1 from me too. I actually took a pass at it a couple of months
> >>>> ago.
> >>>> Some of the hard part was that some of the test classes are in the
> wrong
> >>>> module that references classes in a later module. Obviously that
> >>>> prevents
> >>>> any kind of modular build.
> >>>>
> >>>> As an additional plus to Maven is that Maven includes tools to correct
> >>>> the
> >>>> project and module dependencies.
> >>>>
> >>>> -- Owen
> >>>>
> >>>
> >>>
> >>
> >
>

Re: [Discuss] project chop up

Posted by Edward Capriolo <ed...@gmail.com>.

For those interested in pitching in.
https://github.com/edwardcapriolo/hive



On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Summary from hive-irc channel. Minor edits for spell check/grammar.
>
> The last 10 lines are a summary of the key points.
>
> [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive in
> maven?
> [11:01:06] smonchi [~
> ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit IRC:
> Quit: ... 'cause there is no patch for human stupidity ...
> [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
> [11:10:22] <noland> I saw you created the jira but haven't had time to look
> [11:10:32] <ecapriolo> So I found a few things
> [11:10:49] <ecapriolo> In common there is one or two testats that actually
> fork a process :)
> [11:10:56] <ecapriolo> and use build.test.resources
> [11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
> [11:11:27] <ecapriolo> and shims really needs a separate hadoop test shim
> [11:11:32] <ecapriolo> But that is all simple stuff
> [11:11:47] <ecapriolo> The biggest problem is I do not know how to solve
> shims with maven
> [11:11:50] <ecapriolo> do you have any ideas
> [11:11:52] <ecapriolo> ?
> [11:13:00] <noland> That one is going to be a challenge. It might be that
> in that section we have to drop down to ant
> [11:14:44] <noland> Is it a requirement that we build both the .20 and .23
> shims for a "package" as we do today?
> [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
> [11:16:59] <ecapriolo> Se separate out the interface of shims
> [11:17:22] <ecapriolo> And then at runtime we drop in a driver implementing
> [11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote host
> closed the connection
> [11:17:36] <ecapriolo> That or we could use maven's profile system
> [11:18:09] <ecapriolo> It seems that everything else can actually link
> against hadoop-0.20.2 as a provided dependency
> [11:18:37] <noland> Yeah either would work. The driver method would
> probably require use to use ant build both the drivers?
> [11:18:44] <noland> I am a fan of mvn profiles
> [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out into
> its own project,, not a module
> [11:19:10] <ecapriolo> to achive that jdbc thing
> [11:19:27] <ecapriolo> But I do not have a solution yet, I was looking to
> farm that out to someone smart...like you :)
> [11:19:33] <noland> :)
> [11:19:47] <ecapriolo> All I know is that we need a test shim because
> HadoopShim requires hadoop-test jars
> [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
> [11:20:48] <ecapriolo> Is this something you want to help with? I was
> thinking of spinning up a github
> [11:20:50] <noland> I think that the separate projects would work and
> perhaps nicely.
> [11:21:01] <noland> Yeah I'd be interested in helping!
> [11:21:17] <noland> But I am going on vacation starting next week for
> about 10 days
> [11:21:27] <ecapriolo> Ah cool where are you going?
> [11:21:37] <noland> Netherlands
> [11:21:42] <noland> Biking around and such
> [11:23:52] <noland> The one thing I was thinking about with regards to a
> branch is keeping history. We'll want to keep history for the files but
> AFAICT svn doesn't understand git mv.
> [11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
> [11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com] has
> quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
> [11:35:49] <ecapriolo> noland: Right I do not play to suggest that we will
> do this in git
> [11:36:11] <ecapriolo> I just see that we are going to have to hack stuff
> up and it is not the type of work that lends itself well to branches.
> [11:36:17] <noland> Ahh ok
> [11:36:56] <ecapriolo> Once we come up with a solution for the shims, and
> we have something that can reasonably build and test hive we can figure out
> how to apply that to a branch/trunk
> [11:36:58] <noland> yeah so just do a POC on github and then implement on
> svn
> [11:37:05] <noland> cool
> [11:37:29] <ecapriolo> Along the way we can probably find things that we
> can do like that common test I found and other minor things
> [11:37:41] <noland> sounds good
> [11:37:50] <ecapriolo> Those we can likely just commit into the current
> trunk and I will file issues for those now
> [11:37:58] <noland> cool
> [11:38:41] <ecapriolo> But yea man. I just cant take the project as it is
> now
> [11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
> everything!
> [11:38:53] <ecapriolo> Its like WTF
> [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
> [11:39:12] <ecapriolo> its out of control
> [11:39:23] <noland> LOL
> [11:39:29] <noland> I agree 110%
> [11:39:32] <ecapriolo> eclipse was not always like that I am not sure how
> the hell it happened
> [11:39:51] <noland> The eclipse sep thing is so harmful
> [11:40:08] <noland> dep thing that is
> [11:40:12] <ecapriolo> I mean command line ant was always bad, but you
> used to be able to work in eclipse without having to rebuild everything
> every change/test
> [11:40:39] <noland> Yeah the first thing I do these days is disable the
> ant builder
> [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
> [11:40:55] <noland> it starts compiling while you are still working and
> blocks for minutes
> [11:41:02] <ecapriolo> Right that is what I mean
> [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the project
> [11:41:14] <noland> yeah you can remove it in project…one sec
> [11:41:17] <ecapriolo> perm gen
> [11:41:20] <ecapriolo> ant builder
> [11:41:32] <noland> project -> properties -> builders
> [11:41:34] <ecapriolo> hive does not build offline anymore
> [11:41:37] <noland> yeah
> [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it has
> gotten really really bad
> [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
> non-essentials
> [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff to
> support custom formats
> [11:42:30] <ecapriolo> that is going into its own module
> [11:42:43] <ecapriolo> Going to rip out all the udfs accept between and or.
> [11:43:50] <noland> yeah it'd be nice to have those items in their own
> modules so you can just build/test them when you want
> [11:44:12] <ecapriolo> hbase zookeeper locking
> [11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote host
> closed the connection
> [11:44:44] <noland> yeah for sure
> [11:45:04] <noland> I think the default for testing should be the in
> process locking
> [11:45:10] <ecapriolo> Absolutely.
> [11:45:40] <ecapriolo> The other issue I want to tackle is hive-exec.jar
> [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
> [11:46:46] <ecapriolo> I run into to many conflicts with log4j and guava,
> and commons-utils all those things need to be packaged into non-conflicting
> packages
> [11:46:58] <noland> I haven't looked at how we build that yet but I agree
> it'd be nice if we could jar-jar things like guava
> [11:47:12] <noland> so we can actually use them on server side
> [11:47:16] <ecapriolo> We dont really need quava. its probably just used
> for one tiny thing
> [11:47:43] <ecapriolo> People are forgetting/do not understand that
> hive-exec needs to get sent via the distributed cache
> [11:47:57] <noland> Wen we implement range joins they have a RangeMap that
> we'll need.
> [11:47:57] <ecapriolo> so making it hulkingly fat just slows everything
> down
> [11:48:11] <noland> Do we ship it every time?
> [11:48:25] <noland> Cause we only have to ship it once per version of the
> jar.
> [11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib as
> well
> [11:48:46] <ecapriolo> hive will not work without it
> [11:49:11] <ecapriolo> People are just focused
> feature-feature-feature...bigger...bigger bigger
> [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has quit
> IRC: Quit: Leaving
> [11:49:27] <noland> yeah maven modules will definitely help us understand
> who depends on what.
> [11:49:28] <ecapriolo> Next up kyro
> [11:49:51] <noland> I agree there is a lot of tech debt that needs paying
> [11:50:30] <ecapriolo> So those are all the high level things I want to
> tackle
> [11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
> code, build a better non conflicting hive-exec jar
> [11:51:10] <noland> That sounds good. Once we hack on github for a while
> it'd be nice to develop a brief high level plan on how to implement
> [11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
> scopes like provided etc
> [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
> pulling in the world
> [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
>
>
> On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I
>> am growing tired of how long hive's build take.
>>
>> I have started playing with this by creating a simple multi-module
>> project and copying stuff as I go. I have ported a minimal shims and common
>> and I have all the tests in common almost running.
>>
>> Q. This is going to be ugly hacky work for a while, I was thinking it
>> should be a branch but it is just going to be a mess of moves and copies
>> etc. Not really something you can diff etc.
>>
>> Is anyone else interested in working on this as well. If so I think we
>> can just setup a github and I can arrange for anyone to have access to it.
>>
>> Thanks,
>> Edward
>>
>>
>> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> "Some of the hard part was that some of the test classes are in the wrong
>>> module that references classes in a later module."
>>>
>>> I think the modules will have to be able to reference each other in many
>>> cases. Serde and QL are tightly coupled. QL is really too large and we
>>> should find a way to cut that up.
>>>
>>> Part of this problem is the q.tests
>>>
>>> I think one way to handle this is to only allow unit tests inside the
>>> module. I imagine running all the q tests would be done in a final module
>>> hive-qtest. Or possibly two final modules
>>> hive-qtest
>>> hive-qtest-extra (tangential things like UDFS and input formats not core
>>> to hive)
>>>
>>>
>>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <om...@apache.org>wrote:
>>>
>>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
>>>> kulkarni.swarnim@gmail.com> wrote:
>>>>
>>>> > > I'd like to propose we move towards Maven.
>>>> >
>>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
>>>> etc.)
>>>> > are maven based.
>>>> >
>>>>
>>>> A big +1 from me too. I actually took a pass at it a couple of months
>>>> ago.
>>>> Some of the hard part was that some of the test classes are in the wrong
>>>> module that references classes in a later module. Obviously that
>>>> prevents
>>>> any kind of modular build.
>>>>
>>>> As an additional plus to Maven is that Maven includes tools to correct
>>>> the
>>>> project and module dependencies.
>>>>
>>>> -- Owen
>>>>
>>>
>>>
>>
>

Re: [Discuss] project chop up

Posted by Edward Capriolo <ed...@gmail.com>.

Summary from hive-irc channel. Minor edits for spell check/grammar.

The last 10 lines are a summary of the key points.

[10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive in
maven?
[11:01:06] smonchi [~
ronin@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit IRC:
Quit: ... 'cause there is no patch for human stupidity ...
[11:10:04] <noland> ecapriolo: yeah that sounds good to me!
[11:10:22] <noland> I saw you created the jira but haven't had time to look
[11:10:32] <ecapriolo> So I found a few things
[11:10:49] <ecapriolo> In common there is one or two testats that actually
fork a process :)
[11:10:56] <ecapriolo> and use build.test.resources
[11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
[11:11:27] <ecapriolo> and shims really needs a separate hadoop test shim
[11:11:32] <ecapriolo> But that is all simple stuff
[11:11:47] <ecapriolo> The biggest problem is I do not know how to solve
shims with maven
[11:11:50] <ecapriolo> do you have any ideas
[11:11:52] <ecapriolo> ?
[11:13:00] <noland> That one is going to be a challenge. It might be that
in that section we have to drop down to ant
[11:14:44] <noland> Is it a requirement that we build both the .20 and .23
shims for a "package" as we do today?
[11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
[11:16:59] <ecapriolo> Se separate out the interface of shims
[11:17:22] <ecapriolo> And then at runtime we drop in a driver implementing
[11:17:34] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote host
closed the connection
[11:17:36] <ecapriolo> That or we could use maven's profile system
[11:18:09] <ecapriolo> It seems that everything else can actually link
against hadoop-0.20.2 as a provided dependency
[11:18:37] <noland> Yeah either would work. The driver method would
probably require use to use ant build both the drivers?
[11:18:44] <noland> I am a fan of mvn profiles
[11:19:05] <ecapriolo> I was thinking we kinda separate the shim out into
its own project,, not a module
[11:19:10] <ecapriolo> to achive that jdbc thing
[11:19:27] <ecapriolo> But I do not have a solution yet, I was looking to
farm that out to someone smart...like you :)
[11:19:33] <noland> :)
[11:19:47] <ecapriolo> All I know is that we need a test shim because
HadoopShim requires hadoop-test jars
[11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
[11:20:48] <ecapriolo> Is this something you want to help with? I was
thinking of spinning up a github
[11:20:50] <noland> I think that the separate projects would work and
perhaps nicely.
[11:21:01] <noland> Yeah I'd be interested in helping!
[11:21:17] <noland> But I am going on vacation starting next week for about
10 days
[11:21:27] <ecapriolo> Ah cool where are you going?
[11:21:37] <noland> Netherlands
[11:21:42] <noland> Biking around and such
[11:23:52] <noland> The one thing I was thinking about with regards to a
branch is keeping history. We'll want to keep history for the files but
AFAICT svn doesn't understand git mv.
[11:24:16] Wertax [~wertax@wolfkamp.xs4all.nl] has joined #hive
[11:31:19] jeromatron [~textual@host90-152-1-162.ipv4.regusnet.com] has
quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
[11:35:49] <ecapriolo> noland: Right I do not play to suggest that we will
do this in git
[11:36:11] <ecapriolo> I just see that we are going to have to hack stuff
up and it is not the type of work that lends itself well to branches.
[11:36:17] <noland> Ahh ok
[11:36:56] <ecapriolo> Once we come up with a solution for the shims, and
we have something that can reasonably build and test hive we can figure out
how to apply that to a branch/trunk
[11:36:58] <noland> yeah so just do a POC on github and then implement on
svn
[11:37:05] <noland> cool
[11:37:29] <ecapriolo> Along the way we can probably find things that we
can do like that common test I found and other minor things
[11:37:41] <noland> sounds good
[11:37:50] <ecapriolo> Those we can likely just commit into the current
trunk and I will file issues for those now
[11:37:58] <noland> cool
[11:38:41] <ecapriolo> But yea man. I just cant take the project as it is
now
[11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
everything!
[11:38:53] <ecapriolo> Its like WTF
[11:39:09] <ecapriolo> Running one tests takes like 3 minutes
[11:39:12] <ecapriolo> its out of control
[11:39:23] <noland> LOL
[11:39:29] <noland> I agree 110%
[11:39:32] <ecapriolo> eclipse was not always like that I am not sure how
the hell it happened
[11:39:51] <noland> The eclipse sep thing is so harmful
[11:40:08] <noland> dep thing that is
[11:40:12] <ecapriolo> I mean command line ant was always bad, but you used
to be able to work in eclipse without having to rebuild everything every
change/test
[11:40:39] <noland> Yeah the first thing I do these days is disable the ant
builder
[11:40:52] <ecapriolo> Ow... I did not really know that was a thing
[11:40:55] <noland> it starts compiling while you are still working and
blocks for minutes
[11:41:02] <ecapriolo> Right that is what I mean
[11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the project
[11:41:14] <noland> yeah you can remove it in project…one sec
[11:41:17] <ecapriolo> perm gen
[11:41:20] <ecapriolo> ant builder
[11:41:32] <noland> project -> properties -> builders
[11:41:34] <ecapriolo> hive does not build offline anymore
[11:41:37] <noland> yeah
[11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it has
gotten really really bad
[11:42:09] <ecapriolo> Also what I plan on doing is stripping out
non-essentials
[11:42:25] <ecapriolo> like serde has all this thrift and avro stuff to
support custom formats
[11:42:30] <ecapriolo> that is going into its own module
[11:42:43] <ecapriolo> Going to rip out all the udfs accept between and or.
[11:43:50] <noland> yeah it'd be nice to have those items in their own
modules so you can just build/test them when you want
[11:44:12] <ecapriolo> hbase zookeeper locking
[11:44:31] Wertax [~wertax@wolfkamp.xs4all.nl] has quit IRC: Remote host
closed the connection
[11:44:44] <noland> yeah for sure
[11:45:04] <noland> I think the default for testing should be the in
process locking
[11:45:10] <ecapriolo> Absolutely.
[11:45:40] <ecapriolo> The other issue I want to tackle is hive-exec.jar
[11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
[11:46:46] <ecapriolo> I run into to many conflicts with log4j and guava,
and commons-utils all those things need to be packaged into non-conflicting
packages
[11:46:58] <noland> I haven't looked at how we build that yet but I agree
it'd be nice if we could jar-jar things like guava
[11:47:12] <noland> so we can actually use them on server side
[11:47:16] <ecapriolo> We dont really need quava. its probably just used
for one tiny thing
[11:47:43] <ecapriolo> People are forgetting/do not understand that
hive-exec needs to get sent via the distributed cache
[11:47:57] <noland> Wen we implement range joins they have a RangeMap that
we'll need.
[11:47:57] <ecapriolo> so making it hulkingly fat just slows everything down
[11:48:11] <noland> Do we ship it every time?
[11:48:25] <noland> Cause we only have to ship it once per version of the
jar.
[11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib as
well
[11:48:46] <ecapriolo> hive will not work without it
[11:49:11] <ecapriolo> People are just focused
feature-feature-feature...bigger...bigger bigger
[11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has quit
IRC: Quit: Leaving
[11:49:27] <noland> yeah maven modules will definitely help us understand
who depends on what.
[11:49:28] <ecapriolo> Next up kyro
[11:49:51] <noland> I agree there is a lot of tech debt that needs paying
[11:50:30] <ecapriolo> So those are all the high level things I want to
tackle
[11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
code, build a better non conflicting hive-exec jar
[11:51:10] <noland> That sounds good. Once we hack on github for a while
it'd be nice to develop a brief high level plan on how to implement
[11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
scopes like provided etc
[11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
pulling in the world
[11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive

On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <ed...@gmail.com>wrote:

> I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I
> am growing tired of how long hive's build take.
>
> I have started playing with this by creating a simple multi-module project
> and copying stuff as I go. I have ported a minimal shims and common and I
> have all the tests in common almost running.
>
> Q. This is going to be ugly hacky work for a while, I was thinking it
> should be a branch but it is just going to be a mess of moves and copies
> etc. Not really something you can diff etc.
>
> Is anyone else interested in working on this as well. If so I think we can
> just setup a github and I can arrange for anyone to have access to it.
>
> Thanks,
> Edward
>
>
> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> "Some of the hard part was that some of the test classes are in the wrong
>> module that references classes in a later module."
>>
>> I think the modules will have to be able to reference each other in many
>> cases. Serde and QL are tightly coupled. QL is really too large and we
>> should find a way to cut that up.
>>
>> Part of this problem is the q.tests
>>
>> I think one way to handle this is to only allow unit tests inside the
>> module. I imagine running all the q tests would be done in a final module
>> hive-qtest. Or possibly two final modules
>> hive-qtest
>> hive-qtest-extra (tangential things like UDFS and input formats not core
>> to hive)
>>
>>
>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <om...@apache.org> wrote:
>>
>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
>>> kulkarni.swarnim@gmail.com> wrote:
>>>
>>> > > I'd like to propose we move towards Maven.
>>> >
>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
>>> etc.)
>>> > are maven based.
>>> >
>>>
>>> A big +1 from me too. I actually took a pass at it a couple of months
>>> ago.
>>> Some of the hard part was that some of the test classes are in the wrong
>>> module that references classes in a later module. Obviously that prevents
>>> any kind of modular build.
>>>
>>> As an additional plus to Maven is that Maven includes tools to correct
>>> the
>>> project and module dependencies.
>>>
>>> -- Owen
>>>
>>
>>
>

Re: [Discuss] project chop up

Posted by Edward Capriolo <ed...@gmail.com>.

I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I am
growing tired of how long hive's build take.

I have started playing with this by creating a simple multi-module project
and copying stuff as I go. I have ported a minimal shims and common and I
have all the tests in common almost running.

Q. This is going to be ugly hacky work for a while, I was thinking it
should be a branch but it is just going to be a mess of moves and copies
etc. Not really something you can diff etc.

Is anyone else interested in working on this as well. If so I think we can
just setup a github and I can arrange for anyone to have access to it.

Thanks,
Edward

On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <ed...@gmail.com>wrote:

> "Some of the hard part was that some of the test classes are in the wrong
> module that references classes in a later module."
>
> I think the modules will have to be able to reference each other in many
> cases. Serde and QL are tightly coupled. QL is really too large and we
> should find a way to cut that up.
>
> Part of this problem is the q.tests
>
> I think one way to handle this is to only allow unit tests inside the
> module. I imagine running all the q tests would be done in a final module
> hive-qtest. Or possibly two final modules
> hive-qtest
> hive-qtest-extra (tangential things like UDFS and input formats not core
> to hive)
>
>
> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <om...@apache.org> wrote:
>
>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
>> kulkarni.swarnim@gmail.com> wrote:
>>
>> > > I'd like to propose we move towards Maven.
>> >
>> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
>> etc.)
>> > are maven based.
>> >
>>
>> A big +1 from me too. I actually took a pass at it a couple of months ago.
>> Some of the hard part was that some of the test classes are in the wrong
>> module that references classes in a later module. Obviously that prevents
>> any kind of modular build.
>>
>> As an additional plus to Maven is that Maven includes tools to correct the
>> project and module dependencies.
>>
>> -- Owen
>>
>
>

Re: [Discuss] project chop up

Posted by Edward Capriolo <ed...@gmail.com>.

"Some of the hard part was that some of the test classes are in the wrong
module that references classes in a later module."

I think the modules will have to be able to reference each other in many
cases. Serde and QL are tightly coupled. QL is really too large and we
should find a way to cut that up.

Part of this problem is the q.tests

I think one way to handle this is to only allow unit tests inside the
module. I imagine running all the q tests would be done in a final module
hive-qtest. Or possibly two final modules
hive-qtest
hive-qtest-extra (tangential things like UDFS and input formats not core to
hive)

On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <om...@apache.org> wrote:

> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
> kulkarni.swarnim@gmail.com> wrote:
>
> > > I'd like to propose we move towards Maven.
> >
> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
> etc.)
> > are maven based.
> >
>
> A big +1 from me too. I actually took a pass at it a couple of months ago.
> Some of the hard part was that some of the test classes are in the wrong
> module that references classes in a later module. Obviously that prevents
> any kind of modular build.
>
> As an additional plus to Maven is that Maven includes tools to correct the
> project and module dependencies.
>
> -- Owen
>

Re: [Discuss] project chop up

Posted by Owen O'Malley <om...@apache.org>.

On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> > I'd like to propose we move towards Maven.
>
> Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
> are maven based.
>

A big +1 from me too. I actually took a pass at it a couple of months ago.
Some of the hard part was that some of the test classes are in the wrong
module that references classes in a later module. Obviously that prevents
any kind of modular build.

As an additional plus to Maven is that Maven includes tools to correct the
project and module dependencies.

-- Owen

Re: [Discuss] project chop up

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.

> I'd like to propose we move towards Maven.

Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
are maven based.

Also can't agree more that the current build system is frustrating to say
the least. Another issue I had with the existing ant based system is that
there are no checkpointing capabilities[1]. So if a 6 hour build fails
after 5hr 30 minutes, most of the things even though successful have to be
rebuilt which is very time consuming. Maven reactors have inbuilt support
for lot of this stuff.

[1] https://issues.apache.org/jira/browse/HIVE-3449.


On Wed, Aug 7, 2013 at 2:06 PM, Brock Noland <br...@cloudera.com> wrote:

> Thus far there hasn't been any dissent to managing our modules with maven.
>  In addition there have been several comments positive on a move towards
> maven. I'd like to add Ivy seems to have issues managing multiple versions
> of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
> testing patches that installed the new version of DataNucleus  I have had
> the same issue on HIVE-4388. Requiring the deletion of the ivy cache
> is extremely painful for developers that don't have access to high
> bandwidth connections or live in areas far from California where most of
> these jars are hosted.
>
> I'd like to propose we move towards Maven.
>
>
> On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
> >
> >
> > Yes hive build and test cases got convoluted as the project scope
> > gradually increased. This is the time to take action!
> >
> > Based on my other Apache experiences, I prefer the option #3 "Breakup the
> > projects within our own source tree". Make multiple modules or
> > sub-projects. By default, only key modules will be built.
> >
> > Maven could be a possible candidate.
> >
> > Regards,
> > Mohammad
> >
> >
> >
> > ________________________________
> >  From: Edward Capriolo <ed...@gmail.com>
> > To: "dev@hive.apache.org" <de...@hive.apache.org>
> > Sent: Saturday, July 27, 2013 7:03 AM
> > Subject: Re: [Discuss] project chop up
> >
> >
> > Or feel free to suggest different approach. I am used to managing
> software
> > as multi-module maven projects.
> > From a development standpoint if I was working on beeline, it would be
> nice
> > to only require some of the sub-projects to be open in my IDE to do that.
> > Also managing everything globally is not ideal.
> >
> > Hive's project layout, build, and test infrastructure is just funky. It
> has
> > to do a few interesting things (shims, testing), but I do not think what
> we
> > are doing justifies the massive ant build system we have. Ant is so ten
> > years ago.
> >
> >
> >
> > On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates <ga...@hortonworks.com>
> > wrote:
> >
> > > But I assume they'd still be a part of targets like package, tar, and
> > > binary?  Making them compile and test separately and explicitly load
> the
> > > core Hive jars from maven/ivy seems reasonable.
> > >
> > > Alan.
> > >
> > > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> > >
> > > > Hi,
> > > >
> > > > I think thats part of it but I'd like to decouple the downstream
> > projects
> > > > even further so that the only connection is the dependency on the
> hive
> > > jars.
> > > >
> > > > Brock
> > > > On Jul 26, 2013 10:10 PM, "Alan Gates" <ga...@hortonworks.com>
> wrote:
> > > >
> > > >> I'm not sure how this is different from what hcat does today.  It
> > needs
> > > >> Hive's jars to compile, so it's one of the last things in the
> compile
> > > step.
> > > >> Would moving the other modules you note to be in the same category
> be
> > > >> enough?  Did you want to also make it so that the default ant target
> > > >> doesn't compile those?
> > > >>
> > > >> Alan.
> > > >>
> > > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > > >>
> > > >>> My mistake on saying hcat was a fork metastore. I had a brain fart
> > for
> > > a
> > > >>> moment.
> > > >>>
> > > >>> One way we could do this is create a folder called downstream. In
> our
> > > >>> release step we can execute the downstream builds and then copy the
> > > files
> > > >>> we need back. So nothing downstream will be on the classpath of the
> > > main
> > > >>> project.
> > > >>>
> > > >>> This could help us breakup ql as well. Things like exotic file
> > formats
> > > ,
> > > >>> and things that are pluggable like zk locking can go here. That
> might
> > > be
> > > >>> overkill.
> > > >>>
> > > >>> For now we can focus on building downstream and hivethrift1might be
> > the
> > > >>> first thing to try to downstream.
> > > >>>
> > > >>>
> > > >>> On Friday, July 26, 2013, Thejas Nair <th...@hortonworks.com>
> > wrote:
> > > >>>> +1 to the idea of making the build of core hive and other
> downstream
> > > >>>> components independent.
> > > >>>>
> > > >>>> bq.  I was under the impression that Hcat and hive-metastore was
> > > >>>> supposed to merge up somehow.
> > > >>>>
> > > >>>> The metastore code was never forked. Hcat was just using
> > > >>>> hive-metastore and making the metadata available to rest of hadoop
> > > >>>> (pig, java MR..).
> > > >>>> A lot of the changes that were driven by hcat goals were being
> made
> > in
> > > >>>> hive-metastore. You can think of hcat as set of libraries that let
> > pig
> > > >>>> and java MR use hive metastore. Since hcat is closely tied to
> > > >>>> hive-metastore, it makes sense to have them in same project.
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo <
> > > edlinuxguru@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>>> Also i believe hcatalog web can fall into the same designation.
> > > >>>>>
> > > >>>>> Question , hcatalog was initily a big hive-metastore fork. I was
> > > under
> > > >>> the
> > > >>>>> impression that Hcat and hive-metastore was supposed to merge up
> > > >> somehow.
> > > >>>>> What is the status on that? I remember that was one of the core
> > > reasons
> > > >>> we
> > > >>>>> brought it in.
> > > >>>>>
> > > >>>>> On Friday, July 26, 2013, Edward Capriolo <edlinuxguru@gmail.com
> >
> > > >> wrote:
> > > >>>>>> I prefer option 3 as well.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland <
> > brock@cloudera.com>
> > > >>> wrote:
> > > >>>>>>>
> > > >>>>>>> On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo <
> > > >> edlinuxguru@gmail.com
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> I have been developing my laptop on a duel core 2 GB Ram
> laptop
> > > for
> > > >>>>> years
> > > >>>>>>>> now. With the addition of hcatalog, hive-thrift2, and some
> other
> > > >>> growth
> > > >>>>>>>> trying to develop hive in a eclipse on this machine craws,
> > > >> especially
> > > >>>>> if
> > > >>>>>>>> 'build automatically' is turned on. As we look to add on more
> > > things
> > > >>>>> this
> > > >>>>>>>> is only going to get worse.
> > > >>>>>>>>
> > > >>>>>>>> I am also noticing issues like this:
> > > >>>>>>>>
> > > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-4849
> > > >>>>>>>>
> > > >>>>>>>> What I think we should do is strip down/out optional parts of
> > > hive.
> > > >>>>>>>>
> > > >>>>>>>> 1) Hive Hbase
> > > >>>>>>>> This should really be it's own project to do this right we
> > really
> > > >>>>> have to
> > > >>>>>>>> have multiple branches since hbase is not backwards
> compatible.
> > > >>>>>>>>
> > > >>>>>>>> 2) Hive Web Interface
> > > >>>>>>>> Now really a big project but not really critical can be just
> as
> > > >>> easily
> > > >>>>> be
> > > >>>>>>>> build separately
> > > >>>>>>>>
> > > >>>>>>>> 3) hive thrift 1
> > > >>>>>>>> We have hive thrift 2 now, it is time for the sun to set on
> > > >>>>> hivethrift1,
> > > >>>>>>>>
> > > >>>>>>>> 4) odbc
> > > >>>>>>>> Not entirely convinced about this one but it is really not
> > > critical
> > > >>> to
> > > >>>>>>>> running hive.
> > > >>>>>>>>
> > > >>>>>>>> What I think we should do is create sub-projects for the above
> > > >> things
> > > >>>>> or
> > > >>>>>>>> simply move them into directories that do not build with hive.
> > > >>> Ideally
> > > >>>>> they
> > > >>>>>>>> would use maven to pull dependencies.
> > > >>>>>>>>
> > > >>>>>>>> What does everyone think?
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> I agree that projects like the HBase handler and probably
> others
> > as
> > > >>> well
> > > >>>>>>> should somehow be "downstream" projects which simply depend on
> > the
> > > >> hive
> > > >>>>>>> jars.  I see a couple alternatives for this:
> > > >>>>>>>
> > > >>>>>>> * Take the "module" in question to the Apache Incubator
> > > >>>>>>> * Move the "module" in question to the Apache Extras
> > > >>>>>>> * Breakup the projects within our own source tree
> > > >>>>>>>
> > > >>>>>>> I'd prefer the third option at this point.
> > > >>>>>>>
> > > >>>>>>> Brock
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Brock
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>



-- 
Swarnim