You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Alan Gates <al...@gmail.com> on 2015/05/12 00:38:02 UTC

[DISCUSS] Supporting Hadoop-1 and experimental features

There is a lot of forward-looking work going on in various branches of 
Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It 
would be good to have a way to release this code to users so that they 
can experiment with it.  Releasing it will also provide feedback to 
developers.

At the same time there are discussions on whether to keep supporting 
Hadoop-1.  The burden of supporting older, less used functionality such 
as Hadoop-1 is becoming ever harder as many new features are added.

I propose that the best way to deal with this would be to make a 
branch-1.  We could continue to make new feature releases off of this 
branch (1.3, 1.4, etc.).  This branch would not drop old functionality.  
This provides stability and continuity for users and developers.

We could then merge these new features branches (LLAP, HBase metastore, 
CLI drop) into the trunk, as well as turn on by default newer features 
such as the vectorization and ACID.  We could also drop older, less used 
features such as support for Hadoop-1 and MapReduce.  It will be a while 
before we are ready to make stable, production ready releases of this 
code.  But we could start making alpha quality releases soon.  We would 
call these releases 2.x, to stress the non-backward compatible changes 
such as dropping Hadoop-1.  This will give users a chance to play with 
the new code and developers a chance to get feedback.

Thoughts?

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alexander Pivovarov <ap...@gmail.com>.

Alan, your email client is not compatible with gmail viewer. For some
reason your reply contains the whole thread of the discussion
On May 22, 2015 10:58 AM, "Alan Gates" <al...@gmail.com> wrote:

> I don't think anyone is advocating for option 2, as that would be
> disastrous.  Option 3 is closest to what I'm proposing, though again
> dropping support for Hadoop 1 is only a part of it.
>
> Alan.
>
>   Alexander Pivovarov <ap...@gmail.com>
>  May 22, 2015 at 10:03
> Looks like we discussing 3 options:
>
> 1. Support hadoop 1, 2 and 3 in master branch.
>
> 2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in branch-3
>
> 3. Support hadoop 2 and 3 in master
>
> I DO not think option 2 is good solution because it is much more difficuilt
> to manage 3 active prod branches rather than one master branch.
>
> I think we should go with options 1 or 3.
>
> +1 on Xuefu and Edward opinion
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 22, 2015 at 9:08
> I think branch-2 doesn’t need to be framed as particularly adventurous
> (other than due to general increase of the amount of work done in Hive by
> community).
> All the new features that normally go on trunk/master will go to branch-2.
> branch-2 is just trunk as it is now, in fact there will be no branch-2,
> just master :) The difference is the dropped functionality, not added one.
> So you shouldn’t lose stability if you retain the same process as now by
> just staying on versions off master.
>
> Perhaps, as is usually the case in Apache projects, developing features on
> older branches would be discouraged. Right now, all features usually go on
> trunk/master, and are then back ported as needed and practical; so you
> wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
> and not back port to master.
>
>
>   Chris Drome <cd...@yahoo-inc.com.INVALID>
>  May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where more
> disruptive work can go on without affecting branch-1. While not necessarily
> against this approach, from Yahoo's standpoint, I do have some questions
> (concerns).
> Upgrading to a new version of Hive requires a significant commitment of
> time and resources to stabilize and certify a build for deployment to our
> clusters. Given the size of our clusters and scale of datasets, we have to
> be particularly careful about adopting new functionality. However, at the
> same time we are interested in new testing and making available new
> features and functionality. That said, we would have to rely on branch-1
> for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point
> there would be no option but for users to move to branch-2 as branch-1
> would be effectively end-of-lifed. I'm not sure how long this would take,
> but it would eventually happen as a direct result of the very reason for
> creating branch-2.
> A related concern is how disruptive the code changes will be in branch-2.
> I imagine that changes in early in branch-2 will be easy to backport to
> branch-1, while this effort will become more difficult, if not impractical,
> as time goes. If the code bases diverge too much then this could lead to
> more pressure for users of branch-1 to add features just to branch-1, which
> has been mentioned as undesirable. By the same token, backporting any code
> in branch-2 will require an increasing amount of effort, which contributors
> to branch-2 may not be interested in committing to.
> These questions affect us directly because, while we require a certain
> amount of stability, we also like to pull in new functionality that will be
> of value to our users. For example, our current 0.13 release is probably
> closer to 0.14 at this point. Given the lifespan of a release, it is often
> more palatable to backport features and bugfixes than to jump to a new
> version.
>
> The good thing about this proposal is the opportunity to evaluate and
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
> <se...@hortonworks.com> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.

I don't think anyone is advocating for option 2, as that would be 
disastrous.  Option 3 is closest to what I'm proposing, though again 
dropping support for Hadoop 1 is only a part of it.

Alan.

> Alexander Pivovarov <ma...@gmail.com>
> May 22, 2015 at 10:03
> Looks like we discussing 3 options:
>
> 1. Support hadoop 1, 2 and 3 in master branch.
>
> 2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in 
> branch-3
>
> 3. Support hadoop 2 and 3 in master
>
> I DO not think option 2 is good solution because it is much more 
> difficuilt
> to manage 3 active prod branches rather than one master branch.
>
> I think we should go with options 1 or 3.
>
> +1 on Xuefu and Edward opinion
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 22, 2015 at 9:08
> I think branch-2 doesn’t need to be framed as particularly adventurous
> (other than due to general increase of the amount of work done in Hive by
> community).
> All the new features that normally go on trunk/master will go to branch-2.
> branch-2 is just trunk as it is now, in fact there will be no branch-2,
> just master :) The difference is the dropped functionality, not added one.
> So you shouldn’t lose stability if you retain the same process as now by
> just staying on versions off master.
>
> Perhaps, as is usually the case in Apache projects, developing features on
> older branches would be discouraged. Right now, all features usually go on
> trunk/master, and are then back ported as needed and practical; so you
> wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
> and not back port to master.
>
>
> Chris Drome <ma...@yahoo-inc.com.INVALID>
> May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where 
> more disruptive work can go on without affecting branch-1. While not 
> necessarily against this approach, from Yahoo's standpoint, I do have 
> some questions (concerns).
> Upgrading to a new version of Hive requires a significant commitment 
> of time and resources to stabilize and certify a build for deployment 
> to our clusters. Given the size of our clusters and scale of datasets, 
> we have to be particularly careful about adopting new functionality. 
> However, at the same time we are interested in new testing and making 
> available new features and functionality. That said, we would have to 
> rely on branch-1 for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point 
> there would be no option but for users to move to branch-2 as branch-1 
> would be effectively end-of-lifed. I'm not sure how long this would 
> take, but it would eventually happen as a direct result of the very 
> reason for creating branch-2.
> A related concern is how disruptive the code changes will be in 
> branch-2. I imagine that changes in early in branch-2 will be easy to 
> backport to branch-1, while this effort will become more difficult, if 
> not impractical, as time goes. If the code bases diverge too much then 
> this could lead to more pressure for users of branch-1 to add features 
> just to branch-1, which has been mentioned as undesirable. By the same 
> token, backporting any code in branch-2 will require an increasing 
> amount of effort, which contributors to branch-2 may not be interested 
> in committing to.
> These questions affect us directly because, while we require a certain 
> amount of stability, we also like to pull in new functionality that 
> will be of value to our users. For example, our current 0.13 release 
> is probably closer to 0.14 at this point. Given the lifespan of a 
> release, it is often more palatable to backport features and bugfixes 
> than to jump to a new version.
>
> The good thing about this proposal is the opportunity to evaluate and 
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin 
> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alexander Pivovarov <ap...@gmail.com>.

Looks like we discussing 3 options:

1. Support hadoop 1, 2 and 3 in master branch.

2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in branch-3

3. Support hadoop 2 and 3 in master

I DO not think option 2 is good solution because it is much more difficuilt
to manage 3 active prod branches rather than one master branch.

I think we should go with options 1 or 3.

+1 on Xuefu and Edward opinion
On May 22, 2015 9:09 AM, "Sergey Shelukhin" <se...@hortonworks.com> wrote:

> I think branch-2 doesn’t need to be framed as particularly adventurous
> (other than due to general increase of the amount of work done in Hive by
> community).
> All the new features that normally go on trunk/master will go to branch-2.
> branch-2 is just trunk as it is now, in fact there will be no branch-2,
> just master :) The difference is the dropped functionality, not added one.
> So you shouldn’t lose stability if you retain the same process as now by
> just staying on versions off master.
>
> Perhaps, as is usually the case in Apache projects, developing features on
> older branches would be discouraged. Right now, all features usually go on
> trunk/master, and are then back ported as needed and practical; so you
> wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
> and not back port to master.
>
> On 15/5/22, 00:49, "Chris Drome" <cd...@yahoo-inc.com.INVALID> wrote:
>
> >I understand the motivation and benefits of creating a branch-2 where
> >more disruptive work can go on without affecting branch-1. While not
> >necessarily against this approach, from Yahoo's standpoint, I do have
> >some questions (concerns).
> >Upgrading to a new version of Hive requires a significant commitment of
> >time and resources to stabilize and certify a build for deployment to our
> >clusters. Given the size of our clusters and scale of datasets, we have
> >to be particularly careful about adopting new functionality. However, at
> >the same time we are interested in new testing and making available new
> >features and functionality. That said, we would have to rely on branch-1
> >for the immediate future.
> >One concern is that branch-1 would be left to stagnate, at which point
> >there would be no option but for users to move to branch-2 as branch-1
> >would be effectively end-of-lifed. I'm not sure how long this would take,
> >but it would eventually happen as a direct result of the very reason for
> >creating branch-2.
> >A related concern is how disruptive the code changes will be in branch-2.
> >I imagine that changes in early in branch-2 will be easy to backport to
> >branch-1, while this effort will become more difficult, if not
> >impractical, as time goes. If the code bases diverge too much then this
> >could lead to more pressure for users of branch-1 to add features just to
> >branch-1, which has been mentioned as undesirable. By the same token,
> >backporting any code in branch-2 will require an increasing amount of
> >effort, which contributors to branch-2 may not be interested in
> >committing to.
> >These questions affect us directly because, while we require a certain
> >amount of stability, we also like to pull in new functionality that will
> >be of value to our users. For example, our current 0.13 release is
> >probably closer to 0.14 at this point. Given the lifespan of a release,
> >it is often more palatable to backport features and bugfixes than to jump
> >to a new version.
> >
> >The good thing about this proposal is the opportunity to evaluate and
> >clean up alot of the old code.
> >Thanks,
> >chris
> >
> >
> >
> >     On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
> ><se...@hortonworks.com> wrote:
> >
> >
> > Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but
> >some
> >people are set in their ways or have practical considerations and don’t
> >care for new shiny stuff.
> >
> >On 15/5/18, 11:46, "Sergey Shelukhin" <se...@hortonworks.com> wrote:
> >
> >>I think we need some path for deprecating old Hadoop versions, the same
> >>way we deprecate old Java version support or old RDBMS version support.
> >>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> >>goes for stuff like MR; supporting it, esp. for perf work, becomes a
> >>burden, and it’s outdated with 2 alternatives, one of which has been
> >>around for 2 releases.
> >>The branches are a graceful way to get rid of the legacy burden.
> >>
> >>Alternatively, when sweeping changes are made, we can do what Hbase did
> >>(which is not pretty imho), where 0.94 version had ~30 dot releases
> >>because people cannot upgrade to 0.96 “singularity” release.
> >>
> >>
> >>I posit that people who run Hadoop 1 and MR at this day and age (and more
> >>so as time passes) are people who either don’t care about perf and new
> >>features, only stability; so, stability-focused branch would be perfect
> >>to
> >>support them.
> >>
> >>
> >>On 15/5/18, 10:04, "Edward Capriolo" <ed...@gmail.com> wrote:
> >>
> >>>Up until recently Hive supported numerous versions of Hadoop code base
> >>>with
> >>>a simple shim layer. I would rather we stick to the shim layer. I think
> >>>this was easily the best part about hive was that a single release
> >>>worked
> >>>well regardless of your hadoop version. It was also a key element to
> >>>hive's
> >>>success. I do not want to see us have multiple branches.
> >>>
> >>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com>
> >>>wrote:
> >>>
> >>>> Thanks for the explanation, Alan!
> >>>>
> >>>> While I have understood more on the proposal, I actually see more
> >>>>problems
> >>>> than the confusion of two lines of releases. Essentially, this
> >>>>proposal
> >>>> forces a user to make a hard choice between a stabler, legacy-aware
> >>>>release
> >>>> line and an adventurous, pioneering release line. And once the choice
> >>>>is
> >>>> made, there is no easy way back or forward.
> >>>>
> >>>> Here is my interpretation. Let's say we have two main branches as
> >>>> proposed. I develop a new feature which I think useful for both
> >>>>branches.
> >>>> So, I commit it to both branches. My feature requires additional
> >>>>schema
> >>>> support, so I provide upgrade scripts for both branches. The scripts
> >>>>are
> >>>> different because the two branches have already diverged in schema.
> >>>>
> >>>> Now the two branches evolve in a diverging fashion like this. This is
> >>>>all
> >>>> good as long as a user stays in his line. The moment the user
> >>>>considers
> >>>>a
> >>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
> >>>>Because
> >>>> there is no upgrade path from a release in branch-1 to a release in
> >>>> branch-2!
> >>>>
> >>>> If we want to provide an upgrade path, then there will be MxN paths,
> >>>>where
> >>>> M and N are the number of releases in the two branches, respectively.
> >>>>This
> >>>> is going to be next to a nightmare, not only for users, but also for
> >>>>us.
> >>>>
> >>>> Also, the proposal will require two sets of things that Hive provides:
> >>>> double documentation, double feature tracking, double build/test
> >>>> infrastructures, etc.
> >>>>
> >>>> This approach can also potentially cause the problem we saw in hadoop
> >>>> releases, where 0.23 release was greater than 1.0 release.
> >>>>
> >>>> To me, the problem we are trying to solve is deprecating old things
> >>>>such
> >>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
> >>>> however, we approached the problem in less favorable ways.
> >>>>
> >>>> First, it seemed we wanted to deprecate something just for the sake of
> >>>> deprecation, and it's not based on the rationale that supports the
> >>>>desire.
> >>>> Dev might write code that accidentally break hadoop-1 build. However,
> >>>>this
> >>>> is more a build infrastructure problem rather than the burden of
> >>>>supporting
> >>>> hadoop-1. If our build could catch it at precommit test, then I would
> >>>>think
> >>>> the accident can be well avoided. Most of the times, fixing the build
> >>>>is
> >>>> trivial. And we have already addressed the build infrastructure
> >>>>problem.
> >>>>
> >>>> Secondly, if we do have a strong reason to deprecate something, we
> >>>>should
> >>>> have a deprecation plan rather than declaring on the spot that the
> >>>>current
> >>>> release is the last one supporting X. I think Microsoft did a better
> >>>>job in
> >>>> terms production deprecation. For instance, they announced long before
> >>>>the
> >>>> last day desupporting Windows XP. In my opinion, we should have a
> >>>>similar
> >>>> vision, giving users, distributions enough time to adjust rather than
> >>>> shocking them with breaking news.
> >>>>
> >>>> In summary, I do see the need of deprecation in Hive, but I am afraid
> >>>>the
> >>>> way we take, including the proposal here, isn't going to nicely solve
> >>>>the
> >>>> problem. On the contrary, I foresee a spectrum of confusion,
> >>>>frustration,
> >>>> and burden for the user as well as for developers.
> >>>>
> >>>> Thanks,
> >>>> Xuefu
> >>>>
> >>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com>
> >>>>wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>  Xuefu Zhang <xz...@cloudera.com>
> >>>>>  May 15, 2015 at 17:31
> >>>>>
> >>>>> Just make sure that I understand the proposal correctly: we are going
> >>>>>to
> >>>>> have two main branches, one for hadoop-1 and one for hadoop-2.
> >>>>>
> >>>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not
> >>>>>Hadoop.
> >>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
> >>>>>is
> >>>>> already well established.
> >>>>>
> >>>>>  New features
> >>>>> are only merged to branch-2. That essentially says we stop
> >>>>>development
> >>>>>for
> >>>>> hadoop-1, right?
> >>>>>
> >>>>>  If developers want to keep contributing patches to branch-1 then
> >>>>> there's no need for it to stop.  We would want to avoid putting new
> >>>>> features only on branch-1, unless they only made sense in that
> >>>>>context.
> >>>>> But I assume we'll see people contributing to branch-1 for some time.
> >>>>>
> >>>>>  Are we also making two lines of releases: ene for branch-1
> >>>>> and one for branch-2? Won't that be confusing and also burdensome if
> >>>>>we
> >>>>> release say 1.3, 2.0, 2.1, 1.4...
> >>>>>
> >>>>>  I'm asserting that it will be less confusing than the alternatives.
> >>>>>We
> >>>>> need some way to make early releases of many of the new features.  I
> >>>>> believe that this proposal is less confusing than if we start putting
> >>>>>the
> >>>>> new features in 1.x branches.  This is particularly true because it
> >>>>>would
> >>>>> help us to start being able to drop older functionality like Hadoop-1
> >>>>>and
> >>>>> MapReduce, which is very hard to do in the 1.x line without stranding
> >>>>>users.
> >>>>>
> >>>>>  Please note that we will have hadoop 3 soon. What's the story there?
> >>>>>
> >>>>>  As I said above, I don't see this as tied to Hadoop versions.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>  Thanks,
> >>>>> Xuefu
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
> >>>>><vgumashta@hortonworks.com
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>  +1 on the new branch. I think it’ll help in faster dev time for
> >>>>>these
> >>>>> important changes.
> >>>>>
> >>>>>  —Vaibhav
> >>>>>
> >>>>>  From: Alan Gates <al...@gmail.com> <al...@gmail.com>
> >>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
> >>>>><de...@hive.apache.org> <de...@hive.apache.org>
> >>>>> Date: Friday, May 15, 2015 at 4:11 PM
> >>>>> To: "dev@hive.apache.org" <de...@hive.apache.org> <dev@hive.apache.org
> >
> >>>>><de...@hive.apache.org>
> >>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
> >>>>>
> >>>>>  Anyone else have feedback on this?  If not I'll start a vote next
> >>>>>week.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
> >>>>> May 14, 2015 at 10:44
> >>>>>  Hi,
> >>>>>
> >>>>> +1 on the idea.
> >>>>>
> >>>>> Having a stable release branch with ongoing fixes where we do not
> >>>>>drop
> >>>>> major features would be good all around.
> >>>>>
> >>>>> It lets us accelerate the pace of development, drop major features or
> >>>>> rewrite them entirely without dragging everyone else kicking &
> >>>>>screaming
> >>>>> into that release.
> >>>>>
> >>>>> Cheers,
> >>>>> Gopal
> >>>>>
> >>>>>
> >>>>>
> >>>>>    Sergey Shelukhin <se...@hortonworks.com> <sergey@hortonworks.com
> >
> >>>>> May 11, 2015 at 19:17
> >>>>>  That sounds like a good idea.
> >>>>> Some features could be back ported to branch-1 if viable, but at
> >>>>>least
> >>>>>new
> >>>>> stuff would not be burdened by Hadoop 1/MR code paths.
> >>>>> Probably also a good place to enable vectorization and other perf
> >>>>>features
> >>>>> by default while we make alpha releases.
> >>>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
> >>>>> May 11, 2015 at 15:38
> >>>>>  There is a lot of forward-looking work going on in various branches
> >>>>>of
> >>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
> >>>>>would
> >>>>> be good to have a way to release this code to users so that they can
> >>>>> experiment with it.  Releasing it will also provide feedback to
> >>>>>developers.
> >>>>>
> >>>>> At the same time there are discussions on whether to keep supporting
> >>>>> Hadoop-1.  The burden of supporting older, less used functionality
> >>>>>such as
> >>>>> Hadoop-1 is becoming ever harder as many new features are added.
> >>>>>
> >>>>> I propose that the best way to deal with this would be to make a
> >>>>> branch-1.  We could continue to make new feature releases off of this
> >>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
> >>>>>functionality.
> >>>>> This provides stability and continuity for users and developers.
> >>>>>
> >>>>> We could then merge these new features branches (LLAP, HBase
> >>>>>metastore,
> >>>>> CLI drop) into the trunk, as well as turn on by default newer
> >>>>>features
> >>>>>such
> >>>>> as the vectorization and ACID.  We could also drop older, less used
> >>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
> >>>>>while
> >>>>> before we are ready to make stable, production ready releases of this
> >>>>> code.  But we could start making alpha quality releases soon.  We
> >>>>>would
> >>>>> call these releases 2.x, to stress the non-backward compatible
> >>>>>changes
> >>>>>such
> >>>>> as dropping Hadoop-1.  This will give users a chance to play with the
> >>>>>new
> >>>>> code and developers a chance to get feedback.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>>
> >>>>>
> >>>>>  Vaibhav Gumashta <vg...@hortonworks.com>
> >>>>>  May 15, 2015 at 16:43
> >>>>>  +1 on the new branch. I think it’ll help in faster dev time for
> >>>>>these
> >>>>> important changes.
> >>>>>
> >>>>>  —Vaibhav
> >>>>>
> >>>>>  From: Alan Gates <al...@gmail.com>
> >>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
> >>>>> Date: Friday, May 15, 2015 at 4:11 PM
> >>>>> To: "dev@hive.apache.org" <de...@hive.apache.org>
> >>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
> >>>>>
> >>>>>  Anyone else have feedback on this?  If not I'll start a vote next
> >>>>>week.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>    Gopal Vijayaraghavan <go...@apache.org>
> >>>>>  May 14, 2015 at 10:44
> >>>>> Hi,
> >>>>>
> >>>>> +1 on the idea.
> >>>>>
> >>>>> Having a stable release branch with ongoing fixes where we do not
> >>>>>drop
> >>>>> major features would be good all around.
> >>>>>
> >>>>> It lets us accelerate the pace of development, drop major features or
> >>>>> rewrite them entirely without dragging everyone else kicking &
> >>>>>screaming
> >>>>> into that release.
> >>>>>
> >>>>> Cheers,
> >>>>> Gopal
> >>>>>
> >>>>>
> >>>>>
> >>>>>  Sergey Shelukhin <se...@hortonworks.com>
> >>>>>  May 11, 2015 at 19:17
> >>>>> That sounds like a good idea.
> >>>>> Some features could be back ported to branch-1 if viable, but at
> >>>>>least
> >>>>>new
> >>>>> stuff would not be burdened by Hadoop 1/MR code paths.
> >>>>> Probably also a good place to enable vectorization and other perf
> >>>>>features
> >>>>> by default while we make alpha releases.
> >>>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>>  Alan Gates <al...@gmail.com>
> >>>>>  May 11, 2015 at 15:38
> >>>>> There is a lot of forward-looking work going on in various branches
> >>>>>of
> >>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
> >>>>>would
> >>>>> be good to have a way to release this code to users so that they can
> >>>>> experiment with it.  Releasing it will also provide feedback to
> >>>>>developers.
> >>>>>
> >>>>> At the same time there are discussions on whether to keep supporting
> >>>>> Hadoop-1.  The burden of supporting older, less used functionality
> >>>>>such as
> >>>>> Hadoop-1 is becoming ever harder as many new features are added.
> >>>>>
> >>>>> I propose that the best way to deal with this would be to make a
> >>>>> branch-1.  We could continue to make new feature releases off of this
> >>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
> >>>>>functionality.
> >>>>> This provides stability and continuity for users and developers.
> >>>>>
> >>>>> We could then merge these new features branches (LLAP, HBase
> >>>>>metastore,
> >>>>> CLI drop) into the trunk, as well as turn on by default newer
> >>>>>features
> >>>>>such
> >>>>> as the vectorization and ACID.  We could also drop older, less used
> >>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
> >>>>>while
> >>>>> before we are ready to make stable, production ready releases of this
> >>>>> code.  But we could start making alpha quality releases soon.  We
> >>>>>would
> >>>>> call these releases 2.x, to stress the non-backward compatible
> >>>>>changes
> >>>>>such
> >>>>> as dropping Hadoop-1.  This will give users a chance to play with the
> >>>>>new
> >>>>> code and developers a chance to get feedback.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>>
> >>>>
> >>
> >
> >
> >
> >
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Sergey Shelukhin <se...@hortonworks.com>.

I think branch-2 doesn’t need to be framed as particularly adventurous
(other than due to general increase of the amount of work done in Hive by
community).
All the new features that normally go on trunk/master will go to branch-2.
branch-2 is just trunk as it is now, in fact there will be no branch-2,
just master :) The difference is the dropped functionality, not added one.
So you shouldn’t lose stability if you retain the same process as now by
just staying on versions off master.

Perhaps, as is usually the case in Apache projects, developing features on
older branches would be discouraged. Right now, all features usually go on
trunk/master, and are then back ported as needed and practical; so you
wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
and not back port to master.

On 15/5/22, 00:49, "Chris Drome" <cd...@yahoo-inc.com.INVALID> wrote:

>I understand the motivation and benefits of creating a branch-2 where
>more disruptive work can go on without affecting branch-1. While not
>necessarily against this approach, from Yahoo's standpoint, I do have
>some questions (concerns).
>Upgrading to a new version of Hive requires a significant commitment of
>time and resources to stabilize and certify a build for deployment to our
>clusters. Given the size of our clusters and scale of datasets, we have
>to be particularly careful about adopting new functionality. However, at
>the same time we are interested in new testing and making available new
>features and functionality. That said, we would have to rely on branch-1
>for the immediate future.
>One concern is that branch-1 would be left to stagnate, at which point
>there would be no option but for users to move to branch-2 as branch-1
>would be effectively end-of-lifed. I'm not sure how long this would take,
>but it would eventually happen as a direct result of the very reason for
>creating branch-2.
>A related concern is how disruptive the code changes will be in branch-2.
>I imagine that changes in early in branch-2 will be easy to backport to
>branch-1, while this effort will become more difficult, if not
>impractical, as time goes. If the code bases diverge too much then this
>could lead to more pressure for users of branch-1 to add features just to
>branch-1, which has been mentioned as undesirable. By the same token,
>backporting any code in branch-2 will require an increasing amount of
>effort, which contributors to branch-2 may not be interested in
>committing to.
>These questions affect us directly because, while we require a certain
>amount of stability, we also like to pull in new functionality that will
>be of value to our users. For example, our current 0.13 release is
>probably closer to 0.14 at this point. Given the lifespan of a release,
>it is often more palatable to backport features and bugfixes than to jump
>to a new version.
>
>The good thing about this proposal is the opportunity to evaluate and
>clean up alot of the old code.
>Thanks,
>chris
> 
>
>
>     On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
><se...@hortonworks.com> wrote:
>   
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but
>some
>people are set in their ways or have practical considerations and don’t
>care for new shiny stuff.
>
>On 15/5/18, 11:46, "Sergey Shelukhin" <se...@hortonworks.com> wrote:
>
>>I think we need some path for deprecating old Hadoop versions, the same
>>way we deprecate old Java version support or old RDBMS version support.
>>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
>>goes for stuff like MR; supporting it, esp. for perf work, becomes a
>>burden, and it’s outdated with 2 alternatives, one of which has been
>>around for 2 releases.
>>The branches are a graceful way to get rid of the legacy burden.
>>
>>Alternatively, when sweeping changes are made, we can do what Hbase did
>>(which is not pretty imho), where 0.94 version had ~30 dot releases
>>because people cannot upgrade to 0.96 “singularity” release.
>>
>>
>>I posit that people who run Hadoop 1 and MR at this day and age (and more
>>so as time passes) are people who either don’t care about perf and new
>>features, only stability; so, stability-focused branch would be perfect
>>to
>>support them.
>>
>>
>>On 15/5/18, 10:04, "Edward Capriolo" <ed...@gmail.com> wrote:
>>
>>>Up until recently Hive supported numerous versions of Hadoop code base
>>>with
>>>a simple shim layer. I would rather we stick to the shim layer. I think
>>>this was easily the best part about hive was that a single release
>>>worked
>>>well regardless of your hadoop version. It was also a key element to
>>>hive's
>>>success. I do not want to see us have multiple branches.
>>>
>>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com>
>>>wrote:
>>>
>>>> Thanks for the explanation, Alan!
>>>>
>>>> While I have understood more on the proposal, I actually see more
>>>>problems
>>>> than the confusion of two lines of releases. Essentially, this
>>>>proposal
>>>> forces a user to make a hard choice between a stabler, legacy-aware
>>>>release
>>>> line and an adventurous, pioneering release line. And once the choice
>>>>is
>>>> made, there is no easy way back or forward.
>>>>
>>>> Here is my interpretation. Let's say we have two main branches as
>>>> proposed. I develop a new feature which I think useful for both
>>>>branches.
>>>> So, I commit it to both branches. My feature requires additional
>>>>schema
>>>> support, so I provide upgrade scripts for both branches. The scripts
>>>>are
>>>> different because the two branches have already diverged in schema.
>>>>
>>>> Now the two branches evolve in a diverging fashion like this. This is
>>>>all
>>>> good as long as a user stays in his line. The moment the user
>>>>considers
>>>>a
>>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
>>>>Because
>>>> there is no upgrade path from a release in branch-1 to a release in
>>>> branch-2!
>>>>
>>>> If we want to provide an upgrade path, then there will be MxN paths,
>>>>where
>>>> M and N are the number of releases in the two branches, respectively.
>>>>This
>>>> is going to be next to a nightmare, not only for users, but also for
>>>>us.
>>>>
>>>> Also, the proposal will require two sets of things that Hive provides:
>>>> double documentation, double feature tracking, double build/test
>>>> infrastructures, etc.
>>>>
>>>> This approach can also potentially cause the problem we saw in hadoop
>>>> releases, where 0.23 release was greater than 1.0 release.
>>>>
>>>> To me, the problem we are trying to solve is deprecating old things
>>>>such
>>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
>>>> however, we approached the problem in less favorable ways.
>>>>
>>>> First, it seemed we wanted to deprecate something just for the sake of
>>>> deprecation, and it's not based on the rationale that supports the
>>>>desire.
>>>> Dev might write code that accidentally break hadoop-1 build. However,
>>>>this
>>>> is more a build infrastructure problem rather than the burden of
>>>>supporting
>>>> hadoop-1. If our build could catch it at precommit test, then I would
>>>>think
>>>> the accident can be well avoided. Most of the times, fixing the build
>>>>is
>>>> trivial. And we have already addressed the build infrastructure
>>>>problem.
>>>>
>>>> Secondly, if we do have a strong reason to deprecate something, we
>>>>should
>>>> have a deprecation plan rather than declaring on the spot that the
>>>>current
>>>> release is the last one supporting X. I think Microsoft did a better
>>>>job in
>>>> terms production deprecation. For instance, they announced long before
>>>>the
>>>> last day desupporting Windows XP. In my opinion, we should have a
>>>>similar
>>>> vision, giving users, distributions enough time to adjust rather than
>>>> shocking them with breaking news.
>>>>
>>>> In summary, I do see the need of deprecation in Hive, but I am afraid
>>>>the
>>>> way we take, including the proposal here, isn't going to nicely solve
>>>>the
>>>> problem. On the contrary, I foresee a spectrum of confusion,
>>>>frustration,
>>>> and burden for the user as well as for developers.
>>>>
>>>> Thanks,
>>>> Xuefu
>>>>
>>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com>
>>>>wrote:
>>>>
>>>>>
>>>>>
>>>>>  Xuefu Zhang <xz...@cloudera.com>
>>>>>  May 15, 2015 at 17:31
>>>>>
>>>>> Just make sure that I understand the proposal correctly: we are going
>>>>>to
>>>>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>>>>
>>>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not
>>>>>Hadoop.
>>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
>>>>>is
>>>>> already well established.
>>>>>
>>>>>  New features
>>>>> are only merged to branch-2. That essentially says we stop
>>>>>development
>>>>>for
>>>>> hadoop-1, right?
>>>>>
>>>>>  If developers want to keep contributing patches to branch-1 then
>>>>> there's no need for it to stop.  We would want to avoid putting new
>>>>> features only on branch-1, unless they only made sense in that
>>>>>context.
>>>>> But I assume we'll see people contributing to branch-1 for some time.
>>>>>
>>>>>  Are we also making two lines of releases: ene for branch-1
>>>>> and one for branch-2? Won't that be confusing and also burdensome if
>>>>>we
>>>>> release say 1.3, 2.0, 2.1, 1.4...
>>>>>
>>>>>  I'm asserting that it will be less confusing than the alternatives.
>>>>>We
>>>>> need some way to make early releases of many of the new features.  I
>>>>> believe that this proposal is less confusing than if we start putting
>>>>>the
>>>>> new features in 1.x branches.  This is particularly true because it
>>>>>would
>>>>> help us to start being able to drop older functionality like Hadoop-1
>>>>>and
>>>>> MapReduce, which is very hard to do in the 1.x line without stranding
>>>>>users.
>>>>>
>>>>>  Please note that we will have hadoop 3 soon. What's the story there?
>>>>>
>>>>>  As I said above, I don't see this as tied to Hadoop versions.
>>>>>
>>>>> Alan.
>>>>>
>>>>>  Thanks,
>>>>> Xuefu
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
>>>>><vgumashta@hortonworks.com
>>>>>
>>>>> wrote:
>>>>>
>>>>>  +1 on the new branch. I think it’ll help in faster dev time for
>>>>>these
>>>>> important changes.
>>>>>
>>>>>  —Vaibhav
>>>>>
>>>>>  From: Alan Gates <al...@gmail.com> <al...@gmail.com>
>>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>>><de...@hive.apache.org> <de...@hive.apache.org>
>>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>>> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org>
>>>>><de...@hive.apache.org>
>>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>>
>>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>>week.
>>>>>
>>>>> Alan.
>>>>>
>>>>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
>>>>> May 14, 2015 at 10:44
>>>>>  Hi,
>>>>>
>>>>> +1 on the idea.
>>>>>
>>>>> Having a stable release branch with ongoing fixes where we do not
>>>>>drop
>>>>> major features would be good all around.
>>>>>
>>>>> It lets us accelerate the pace of development, drop major features or
>>>>> rewrite them entirely without dragging everyone else kicking &
>>>>>screaming
>>>>> into that release.
>>>>>
>>>>> Cheers,
>>>>> Gopal
>>>>>
>>>>>
>>>>>
>>>>>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
>>>>> May 11, 2015 at 19:17
>>>>>  That sounds like a good idea.
>>>>> Some features could be back ported to branch-1 if viable, but at
>>>>>least
>>>>>new
>>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>>> Probably also a good place to enable vectorization and other perf
>>>>>features
>>>>> by default while we make alpha releases.
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
>>>>> May 11, 2015 at 15:38
>>>>>  There is a lot of forward-looking work going on in various branches
>>>>>of
>>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>>would
>>>>> be good to have a way to release this code to users so that they can
>>>>> experiment with it.  Releasing it will also provide feedback to
>>>>>developers.
>>>>>
>>>>> At the same time there are discussions on whether to keep supporting
>>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>>such as
>>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>>
>>>>> I propose that the best way to deal with this would be to make a
>>>>> branch-1.  We could continue to make new feature releases off of this
>>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>>functionality.
>>>>> This provides stability and continuity for users and developers.
>>>>>
>>>>> We could then merge these new features branches (LLAP, HBase
>>>>>metastore,
>>>>> CLI drop) into the trunk, as well as turn on by default newer
>>>>>features
>>>>>such
>>>>> as the vectorization and ACID.  We could also drop older, less used
>>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>>while
>>>>> before we are ready to make stable, production ready releases of this
>>>>> code.  But we could start making alpha quality releases soon.  We
>>>>>would
>>>>> call these releases 2.x, to stress the non-backward compatible
>>>>>changes
>>>>>such
>>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>>new
>>>>> code and developers a chance to get feedback.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>>
>>>>>  Vaibhav Gumashta <vg...@hortonworks.com>
>>>>>  May 15, 2015 at 16:43
>>>>>  +1 on the new branch. I think it’ll help in faster dev time for
>>>>>these
>>>>> important changes.
>>>>>
>>>>>  —Vaibhav
>>>>>
>>>>>  From: Alan Gates <al...@gmail.com>
>>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>>> To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>>
>>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>>week.
>>>>>
>>>>> Alan.
>>>>>
>>>>>    Gopal Vijayaraghavan <go...@apache.org>
>>>>>  May 14, 2015 at 10:44
>>>>> Hi,
>>>>>
>>>>> +1 on the idea.
>>>>>
>>>>> Having a stable release branch with ongoing fixes where we do not
>>>>>drop
>>>>> major features would be good all around.
>>>>>
>>>>> It lets us accelerate the pace of development, drop major features or
>>>>> rewrite them entirely without dragging everyone else kicking &
>>>>>screaming
>>>>> into that release.
>>>>>
>>>>> Cheers,
>>>>> Gopal
>>>>>
>>>>>
>>>>>
>>>>>  Sergey Shelukhin <se...@hortonworks.com>
>>>>>  May 11, 2015 at 19:17
>>>>> That sounds like a good idea.
>>>>> Some features could be back ported to branch-1 if viable, but at
>>>>>least
>>>>>new
>>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>>> Probably also a good place to enable vectorization and other perf
>>>>>features
>>>>> by default while we make alpha releases.
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>  Alan Gates <al...@gmail.com>
>>>>>  May 11, 2015 at 15:38
>>>>> There is a lot of forward-looking work going on in various branches
>>>>>of
>>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>>would
>>>>> be good to have a way to release this code to users so that they can
>>>>> experiment with it.  Releasing it will also provide feedback to
>>>>>developers.
>>>>>
>>>>> At the same time there are discussions on whether to keep supporting
>>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>>such as
>>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>>
>>>>> I propose that the best way to deal with this would be to make a
>>>>> branch-1.  We could continue to make new feature releases off of this
>>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>>functionality.
>>>>> This provides stability and continuity for users and developers.
>>>>>
>>>>> We could then merge these new features branches (LLAP, HBase
>>>>>metastore,
>>>>> CLI drop) into the trunk, as well as turn on by default newer
>>>>>features
>>>>>such
>>>>> as the vectorization and ACID.  We could also drop older, less used
>>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>>while
>>>>> before we are ready to make stable, production ready releases of this
>>>>> code.  But we could start making alpha quality releases soon.  We
>>>>>would
>>>>> call these releases 2.x, to stress the non-backward compatible
>>>>>changes
>>>>>such
>>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>>new
>>>>> code and developers a chance to get feedback.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>
>>
>
>
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Edward Capriolo <ed...@gmail.com>.

"Same goes for stuff like MR; supporting it, esp. for perf work, becomes a
burden, and it’s outdated with 2 alternatives, one of which has been
around for 2 releases."

I am not trying to pick on your words here but I want to acknowledge
something.

"Been around for 2 releases" means less to people than you would think.
Many of users are locked in by when the distribution chooses to cut a
release. Now as it turns outs there are two major distributions, one
distribution does pretty much nothing to support tez. Here is what "around
for two releases" means for a CDH user:

http://search-hadoop.com/m/8er9RFVSf2&subj=Re+Getting+Tez+working+against+cdh+5+3

After much hacking with a rather new CDH version I was actually unable to
get the alternative running.

The other alternative, which I am presuming, to mean hive-on-spark probably
has not shipped in many distributions either. I do not think either
"alternative" has much real world battlefield experience.

The reality is a normal user has to test a series of processes before they
can pull the trigger on an upgrade. For example, I used to work at a adtech
company. Hive added a feature called "Exchange partitions".Tthis actually
broke a number of our processes because we use the word "exchange" all the
time.It became a keyword many of our scripts broke. This is not a fault of
hive or the feature, but is is just a fact that no one wants to touch test
big lumbering ETL proceses (even with lightning fast sexy engines) five
times a year.

I mentioned this before but I want to repeat. Hive was "releasable trunk"
for a long time and it served users well. We never had 2-4 feature
branches. One binary dropped ontop of hadoop 17, 20, 21, 203 and 2.0. If we
get in a situation where all the "old users" "don't care about new
features" we can easily land in a situation where are actual users are
running the "old" hadoop unable to upgrade to the "hive with the new
features" because it requires dependencies < 2 months old not ported to
their distribution yet. As a user I am already starting to see this where
the distributions behind hive because a point upgrade is not compelling for
the distributor.

On Fri, May 22, 2015 at 4:19 PM, Alan Gates <al...@gmail.com> wrote:

> I agree with *All* features with the exception that some features might be
> branch-1 specific (if it's a feature on something no longer supported in
> master, like hadoop-1).  Without this we prevent new features for older
> technology, which doesn't strike me as reasonable.
>
> I see your point on saying the contributor may not understand where best
> to put the patch, and thus the committer decides.  However, it would be
> very disappointing for a contributor who uses branch-1 to build a new
> feature only to have the committer put it only in master.  So I would
> modify your modification to say "at the discretion of the contributor and
> Hive committers".
>
> Alan.
>
>   kulkarni.swarnim@gmail.com
>  May 22, 2015 at 11:41
> +1 on the new proposal. Feedback below:
>
> > New features must be put into master.  Whether to put them into
> branch-1 is at the discretion of the developer.
>
> How about we change this to "*All* features must be put into master.
> Whether to put them into branch-1 is at the discretion of the *committer*."
> The reason I think is going forward for us to sustain as a happy and
> healthy community, it's imperative for us to make it not only easy for the
> users, but also for developers and committers to contribute/commit patches.
> To me being a hive contributor would be hard to determine which branch my
> code belongs. Also IMO(and I might be wrong) but many committers have their
> own areas of expertise and it's also very hard for them to immediately
> determine what branch a patch should go to unless very well documented
> somewhere. Putting all code into the master would be an easy approach to
> follow and then cherry picking to other branches can be done. So even if
> people forget to do that, we can always go back to master and port the
> patches out to these branches. So we have a master branch, a branch-1 for
> stable code, branch-2 for experimental and "bleeding edge" code and so on.
> Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on.
>
> Another reason I say this is because in my experience, a pretty
> significant amount of work is hive is still bug fixes and I think that is
> what the user cares most about(correctness above anything else). So with
> this approach, might be very obvious to what branches to commit this to.
>
>
>
>
> --
> Swarnim
>    Chris Drome <cd...@yahoo-inc.com.INVALID>
>  May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where more
> disruptive work can go on without affecting branch-1. While not necessarily
> against this approach, from Yahoo's standpoint, I do have some questions
> (concerns).
> Upgrading to a new version of Hive requires a significant commitment of
> time and resources to stabilize and certify a build for deployment to our
> clusters. Given the size of our clusters and scale of datasets, we have to
> be particularly careful about adopting new functionality. However, at the
> same time we are interested in new testing and making available new
> features and functionality. That said, we would have to rely on branch-1
> for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point
> there would be no option but for users to move to branch-2 as branch-1
> would be effectively end-of-lifed. I'm not sure how long this would take,
> but it would eventually happen as a direct result of the very reason for
> creating branch-2.
> A related concern is how disruptive the code changes will be in branch-2.
> I imagine that changes in early in branch-2 will be easy to backport to
> branch-1, while this effort will become more difficult, if not impractical,
> as time goes. If the code bases diverge too much then this could lead to
> more pressure for users of branch-1 to add features just to branch-1, which
> has been mentioned as undesirable. By the same token, backporting any code
> in branch-2 will require an increasing amount of effort, which contributors
> to branch-2 may not be interested in committing to.
> These questions affect us directly because, while we require a certain
> amount of stability, we also like to pull in new functionality that will be
> of value to our users. For example, our current 0.13 release is probably
> closer to 0.14 at this point. Given the lifespan of a release, it is often
> more palatable to backport features and bugfixes than to jump to a new
> version.
>
> The good thing about this proposal is the opportunity to evaluate and
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
> <se...@hortonworks.com> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>
>   Edward Capriolo <ed...@gmail.com>
>  May 18, 2015 at 10:04
> Up until recently Hive supported numerous versions of Hadoop code base with
> a simple shim layer. I would rather we stick to the shim layer. I think
> this was easily the best part about hive was that a single release worked
> well regardless of your hadoop version. It was also a key element to hive's
> success. I do not want to see us have multiple branches.
>
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Nick Dimiduk <nd...@gmail.com>.

On Fri, May 22, 2015 at 1:19 PM, Alan Gates <al...@gmail.com> wrote:

> I see your point on saying the contributor may not understand where best
> to put the patch, and thus the committer decides.  However, it would be
> very disappointing for a contributor who uses branch-1 to build a new
> feature only to have the committer put it only in master.  So I would
> modify your modification to say "at the discretion of the contributor and
> Hive committers".
>

For what its worth, this is more or less how HBase works. All features land
first in master and then percolate backwards to open, active branches
where's it's acceptable to do so. Since our 1.0 release, we're trying to
make 1.0+ follow more closely to semantic versioning. This means that new
features never land in a released minor branch. Bug fixes are applied to
all applicable branches, sometimes this means older release branches and
not master. Sometimes that means contributors are forced to upgrade in
order to take advantage of their contribution in an Apache release (they're
fine to run their own patched builds as they like; it's open source). Right
now we have:

master -> (unreleased, development branch for eventual 2.0)
branch-1 -> (unreleased, development branch for 1.x series, soon to be
branch basis for 1.2)
branch-1.1 -> (released branch, accepting only bug fixes for 1.1.x line)
branch-1.0 -> (released branch, accepting only bug fixes for 1.0.x line)

When we're ready, branch-1.2 will fork from branch-1 and branch-1 will
become development branch for 1.3. Eventually we'll decide it's time for
2.0 and master will be branched, creating branch-2. branch-2 will follow
the same process.

We also maintain active branches for 0.98.x and 0.94.x. These branches are
"different", following our old model of receiving backward-compatible new
features in .x versions. 0.94 is basically retired now, only getting bug
fixes. 0.94 is only hadoop-1, 0.98 supports both hadoop-1 and hadoop-2
(maybe we've retired hadoop-2 support here in the .12 release?), 1.x
support hadoop-2 only. 2.0 is undecided, but presumably will be hadoop-2
and hadoop-3 if we can extend our shim layer for it.

We have separate release managers for 0.94, 0.98, 1.0, and 1.1, and we're
discussing preparations for 1.2. They enforce commits against their
respective branches.


>   kulkarni.swarnim@gmail.com
>  May 22, 2015 at 11:41
> +1 on the new proposal. Feedback below:
>
> > New features must be put into master.  Whether to put them into
> branch-1 is at the discretion of the developer.
>
> How about we change this to "*All* features must be put into master.
> Whether to put them into branch-1 is at the discretion of the *committer*."
> The reason I think is going forward for us to sustain as a happy and
> healthy community, it's imperative for us to make it not only easy for the
> users, but also for developers and committers to contribute/commit patches.
> To me being a hive contributor would be hard to determine which branch my
> code belongs. Also IMO(and I might be wrong) but many committers have their
> own areas of expertise and it's also very hard for them to immediately
> determine what branch a patch should go to unless very well documented
> somewhere. Putting all code into the master would be an easy approach to
> follow and then cherry picking to other branches can be done. So even if
> people forget to do that, we can always go back to master and port the
> patches out to these branches. So we have a master branch, a branch-1 for
> stable code, branch-2 for experimental and "bleeding edge" code and so on.
> Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on.
>
> Another reason I say this is because in my experience, a pretty
> significant amount of work is hive is still bug fixes and I think that is
> what the user cares most about(correctness above anything else). So with
> this approach, might be very obvious to what branches to commit this to.
>
>
>
>
> --
> Swarnim
>    Chris Drome <cd...@yahoo-inc.com.INVALID>
>  May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where more
> disruptive work can go on without affecting branch-1. While not necessarily
> against this approach, from Yahoo's standpoint, I do have some questions
> (concerns).
> Upgrading to a new version of Hive requires a significant commitment of
> time and resources to stabilize and certify a build for deployment to our
> clusters. Given the size of our clusters and scale of datasets, we have to
> be particularly careful about adopting new functionality. However, at the
> same time we are interested in new testing and making available new
> features and functionality. That said, we would have to rely on branch-1
> for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point
> there would be no option but for users to move to branch-2 as branch-1
> would be effectively end-of-lifed. I'm not sure how long this would take,
> but it would eventually happen as a direct result of the very reason for
> creating branch-2.
> A related concern is how disruptive the code changes will be in branch-2.
> I imagine that changes in early in branch-2 will be easy to backport to
> branch-1, while this effort will become more difficult, if not impractical,
> as time goes. If the code bases diverge too much then this could lead to
> more pressure for users of branch-1 to add features just to branch-1, which
> has been mentioned as undesirable. By the same token, backporting any code
> in branch-2 will require an increasing amount of effort, which contributors
> to branch-2 may not be interested in committing to.
> These questions affect us directly because, while we require a certain
> amount of stability, we also like to pull in new functionality that will be
> of value to our users. For example, our current 0.13 release is probably
> closer to 0.14 at this point. Given the lifespan of a release, it is often
> more palatable to backport features and bugfixes than to jump to a new
> version.
>
> The good thing about this proposal is the opportunity to evaluate and
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
> <se...@hortonworks.com> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>
>   Edward Capriolo <ed...@gmail.com>
>  May 18, 2015 at 10:04
> Up until recently Hive supported numerous versions of Hadoop code base with
> a simple shim layer. I would rather we stick to the shim layer. I think
> this was easily the best part about hive was that a single release worked
> well regardless of your hadoop version. It was also a key element to hive's
> success. I do not want to see us have multiple branches.
>
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.

I agree with *All* features with the exception that some features might 
be branch-1 specific (if it's a feature on something no longer supported 
in master, like hadoop-1).  Without this we prevent new features for 
older technology, which doesn't strike me as reasonable.

I see your point on saying the contributor may not understand where best 
to put the patch, and thus the committer decides.  However, it would be 
very disappointing for a contributor who uses branch-1 to build a new 
feature only to have the committer put it only in master.  So I would 
modify your modification to say "at the discretion of the contributor 
and Hive committers".

Alan.

> kulkarni.swarnim@gmail.com <ma...@gmail.com>
> May 22, 2015 at 11:41
> +1 on the new proposal. Feedback below:
>
> > New features must be put into master.  Whether to put them into 
> branch-1 is at the discretion of the developer.
>
> How about we change this to "*_All_* features must be put into master. 
> Whether to put them into branch-1 is at the discretion of the 
> *_committer_*." The reason I think is going forward for us to sustain 
> as a happy and healthy community, it's imperative for us to make it 
> not only easy for the users, but also for developers and committers to 
> contribute/commit patches. To me being a hive contributor would be 
> hard to determine which branch my code belongs. Also IMO(and I might 
> be wrong) but many committers have their own areas of expertise and 
> it's also very hard for them to immediately determine what branch a 
> patch should go to unless very well documented somewhere. Putting all 
> code into the master would be an easy approach to follow and then 
> cherry picking to other branches can be done. So even if people forget 
> to do that, we can always go back to master and port the patches out 
> to these branches. So we have a master branch, a branch-1 for stable 
> code, branch-2 for experimental and "bleeding edge" code and so on. 
> Once branch-2 is stable, we deprecate branch-1, create branch-3 and 
> move on.
>
> Another reason I say this is because in my experience, a pretty 
> significant amount of work is hive is still bug fixes and I think that 
> is what the user cares most about(correctness above anything else). So 
> with this approach, might be very obvious to what branches to commit 
> this to.
>
>
>
>
> -- 
> Swarnim
> Chris Drome <ma...@yahoo-inc.com.INVALID>
> May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where 
> more disruptive work can go on without affecting branch-1. While not 
> necessarily against this approach, from Yahoo's standpoint, I do have 
> some questions (concerns).
> Upgrading to a new version of Hive requires a significant commitment 
> of time and resources to stabilize and certify a build for deployment 
> to our clusters. Given the size of our clusters and scale of datasets, 
> we have to be particularly careful about adopting new functionality. 
> However, at the same time we are interested in new testing and making 
> available new features and functionality. That said, we would have to 
> rely on branch-1 for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point 
> there would be no option but for users to move to branch-2 as branch-1 
> would be effectively end-of-lifed. I'm not sure how long this would 
> take, but it would eventually happen as a direct result of the very 
> reason for creating branch-2.
> A related concern is how disruptive the code changes will be in 
> branch-2. I imagine that changes in early in branch-2 will be easy to 
> backport to branch-1, while this effort will become more difficult, if 
> not impractical, as time goes. If the code bases diverge too much then 
> this could lead to more pressure for users of branch-1 to add features 
> just to branch-1, which has been mentioned as undesirable. By the same 
> token, backporting any code in branch-2 will require an increasing 
> amount of effort, which contributors to branch-2 may not be interested 
> in committing to.
> These questions affect us directly because, while we require a certain 
> amount of stability, we also like to pull in new functionality that 
> will be of value to our users. For example, our current 0.13 release 
> is probably closer to 0.14 at this point. Given the lifespan of a 
> release, it is often more palatable to backport features and bugfixes 
> than to jump to a new version.
>
> The good thing about this proposal is the opportunity to evaluate and 
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin 
> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>
> Edward Capriolo <ma...@gmail.com>
> May 18, 2015 at 10:04
> Up until recently Hive supported numerous versions of Hadoop code base 
> with
> a simple shim layer. I would rather we stick to the shim layer. I think
> this was easily the best part about hive was that a single release worked
> well regardless of your hadoop version. It was also a key element to 
> hive's
> success. I do not want to see us have multiple branches.
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.

+1 on the new proposal. Feedback below:

> New features must be put into master.  Whether to put them into branch-1
is at the discretion of the developer.

How about we change this to "*All* features must be put into master.
Whether to put them into branch-1 is at the discretion of the *committer*."
The reason I think is going forward for us to sustain as a happy and
healthy community, it's imperative for us to make it not only easy for the
users, but also for developers and committers to contribute/commit patches.
To me being a hive contributor would be hard to determine which branch my
code belongs. Also IMO(and I might be wrong) but many committers have their
own areas of expertise and it's also very hard for them to immediately
determine what branch a patch should go to unless very well documented
somewhere. Putting all code into the master would be an easy approach to
follow and then cherry picking to other branches can be done. So even if
people forget to do that, we can always go back to master and port the
patches out to these branches. So we have a master branch, a branch-1 for
stable code, branch-2 for experimental and "bleeding edge" code and so on.
Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on.

Another reason I say this is because in my experience, a pretty significant
amount of work is hive is still bug fixes and I think that is what the user
cares most about(correctness above anything else). So with this approach,
might be very obvious to what branches to commit this to.

On Fri, May 22, 2015 at 1:11 PM, Alan Gates <al...@gmail.com> wrote:

> Thanks for your feedback Chris.  It sounds like there are a couple of
> reasonable concerns being voiced repeatedly:
> 1) Fragmentation, the two branches will drift too far apart.
> 2) Stagnation, branch-1 will effectively become a dead-end.
>
> So I modify the proposal as follows to deal with those:
>
> 1) New features must be put into master.  Whether to put them into
> branch-1 is at the discretion of the developer.  The exception would be
> features that would not apply in master (e.g. say someone developed a way
> to double the speed of map reduce jobs Hive produces).  For example, I
> might choose to put the materialized view work I'm doing in both branch-1
> and master, but the HBase metastore work only in master.  This should avoid
> fragmentation by keeping branch-1 a subset of master.
>
> 2) For the next 12 months we will port critical bug fixes (crashes,
> security issues, wrong results) to branch-1 as well as fixing them on
> master.  We might choose to lengthen this time depending on how stable
> master is and how fast the uptake is.  This avoids branch-1 being
> immediately abandoned by developers while users are still depending on it.
>
> Alan.
>
>   Chris Drome <cd...@yahoo-inc.com.INVALID>
>  May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where more
> disruptive work can go on without affecting branch-1. While not necessarily
> against this approach, from Yahoo's standpoint, I do have some questions
> (concerns).
> Upgrading to a new version of Hive requires a significant commitment of
> time and resources to stabilize and certify a build for deployment to our
> clusters. Given the size of our clusters and scale of datasets, we have to
> be particularly careful about adopting new functionality. However, at the
> same time we are interested in new testing and making available new
> features and functionality. That said, we would have to rely on branch-1
> for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point
> there would be no option but for users to move to branch-2 as branch-1
> would be effectively end-of-lifed. I'm not sure how long this would take,
> but it would eventually happen as a direct result of the very reason for
> creating branch-2.
> A related concern is how disruptive the code changes will be in branch-2.
> I imagine that changes in early in branch-2 will be easy to backport to
> branch-1, while this effort will become more difficult, if not impractical,
> as time goes. If the code bases diverge too much then this could lead to
> more pressure for users of branch-1 to add features just to branch-1, which
> has been mentioned as undesirable. By the same token, backporting any code
> in branch-2 will require an increasing amount of effort, which contributors
> to branch-2 may not be interested in committing to.
> These questions affect us directly because, while we require a certain
> amount of stability, we also like to pull in new functionality that will be
> of value to our users. For example, our current 0.13 release is probably
> closer to 0.14 at this point. Given the lifespan of a release, it is often
> more palatable to backport features and bugfixes than to jump to a new
> version.
>
> The good thing about this proposal is the opportunity to evaluate and
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
> <se...@hortonworks.com> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>
>   Edward Capriolo <ed...@gmail.com>
>  May 18, 2015 at 10:04
> Up until recently Hive supported numerous versions of Hadoop code base with
> a simple shim layer. I would rather we stick to the shim layer. I think
> this was easily the best part about hive was that a single release worked
> well regardless of your hadoop version. It was also a key element to hive's
> success. I do not want to see us have multiple branches.
>
>
>   Xuefu Zhang <xz...@cloudera.com>
>  May 15, 2015 at 22:29
> Thanks for the explanation, Alan!
>
> While I have understood more on the proposal, I actually see more problems
> than the confusion of two lines of releases. Essentially, this proposal
> forces a user to make a hard choice between a stabler, legacy-aware release
> line and an adventurous, pioneering release line. And once the choice is
> made, there is no easy way back or forward.
>
> Here is my interpretation. Let's say we have two main branches as
> proposed. I develop a new feature which I think useful for both branches.
> So, I commit it to both branches. My feature requires additional schema
> support, so I provide upgrade scripts for both branches. The scripts are
> different because the two branches have already diverged in schema.
>
> Now the two branches evolve in a diverging fashion like this. This is all
> good as long as a user stays in his line. The moment the user considers a
> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because
> there is no upgrade path from a release in branch-1 to a release in
> branch-2!
>
> If we want to provide an upgrade path, then there will be MxN paths, where
> M and N are the number of releases in the two branches, respectively. This
> is going to be next to a nightmare, not only for users, but also for us.
>
> Also, the proposal will require two sets of things that Hive provides:
> double documentation, double feature tracking, double build/test
> infrastructures, etc.
>
> This approach can also potentially cause the problem we saw in hadoop
> releases, where 0.23 release was greater than 1.0 release.
>
> To me, the problem we are trying to solve is deprecating old things such
> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
> however, we approached the problem in less favorable ways.
>
> First, it seemed we wanted to deprecate something just for the sake of
> deprecation, and it's not based on the rationale that supports the desire.
> Dev might write code that accidentally break hadoop-1 build. However, this
> is more a build infrastructure problem rather than the burden of supporting
> hadoop-1. If our build could catch it at precommit test, then I would think
> the accident can be well avoided. Most of the times, fixing the build is
> trivial. And we have already addressed the build infrastructure problem.
>
> Secondly, if we do have a strong reason to deprecate something, we should
> have a deprecation plan rather than declaring on the spot that the current
> release is the last one supporting X. I think Microsoft did a better job in
> terms production deprecation. For instance, they announced long before the
> last day desupporting Windows XP. In my opinion, we should have a similar
> vision, giving users, distributions enough time to adjust rather than
> shocking them with breaking news.
>
> In summary, I do see the need of deprecation in Hive, but I am afraid the
> way we take, including the proposal here, isn't going to nicely solve the
> problem. On the contrary, I foresee a spectrum of confusion, frustration,
> and burden for the user as well as for developers.
>
> Thanks,
> Xuefu
>
>
>


-- 
Swarnim

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.

Thanks for your feedback Chris.  It sounds like there are a couple of 
reasonable concerns being voiced repeatedly:
1) Fragmentation, the two branches will drift too far apart.
2) Stagnation, branch-1 will effectively become a dead-end.

So I modify the proposal as follows to deal with those:

1) New features must be put into master.  Whether to put them into 
branch-1 is at the discretion of the developer.  The exception would be 
features that would not apply in master (e.g. say someone developed a 
way to double the speed of map reduce jobs Hive produces).  For example, 
I might choose to put the materialized view work I'm doing in both 
branch-1 and master, but the HBase metastore work only in master.  This 
should avoid fragmentation by keeping branch-1 a subset of master.

2) For the next 12 months we will port critical bug fixes (crashes, 
security issues, wrong results) to branch-1 as well as fixing them on 
master.  We might choose to lengthen this time depending on how stable 
master is and how fast the uptake is.  This avoids branch-1 being 
immediately abandoned by developers while users are still depending on it.

Alan.

> Chris Drome <ma...@yahoo-inc.com.INVALID>
> May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where 
> more disruptive work can go on without affecting branch-1. While not 
> necessarily against this approach, from Yahoo's standpoint, I do have 
> some questions (concerns).
> Upgrading to a new version of Hive requires a significant commitment 
> of time and resources to stabilize and certify a build for deployment 
> to our clusters. Given the size of our clusters and scale of datasets, 
> we have to be particularly careful about adopting new functionality. 
> However, at the same time we are interested in new testing and making 
> available new features and functionality. That said, we would have to 
> rely on branch-1 for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point 
> there would be no option but for users to move to branch-2 as branch-1 
> would be effectively end-of-lifed. I'm not sure how long this would 
> take, but it would eventually happen as a direct result of the very 
> reason for creating branch-2.
> A related concern is how disruptive the code changes will be in 
> branch-2. I imagine that changes in early in branch-2 will be easy to 
> backport to branch-1, while this effort will become more difficult, if 
> not impractical, as time goes. If the code bases diverge too much then 
> this could lead to more pressure for users of branch-1 to add features 
> just to branch-1, which has been mentioned as undesirable. By the same 
> token, backporting any code in branch-2 will require an increasing 
> amount of effort, which contributors to branch-2 may not be interested 
> in committing to.
> These questions affect us directly because, while we require a certain 
> amount of stability, we also like to pull in new functionality that 
> will be of value to our users. For example, our current 0.13 release 
> is probably closer to 0.14 at this point. Given the lifespan of a 
> release, it is often more palatable to backport features and bugfixes 
> than to jump to a new version.
>
> The good thing about this proposal is the opportunity to evaluate and 
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin 
> <se...@hortonworks.com> wrote:
>
>
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
>
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
>
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
>
>
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.
>
>
>
> Edward Capriolo <ma...@gmail.com>
> May 18, 2015 at 10:04
> Up until recently Hive supported numerous versions of Hadoop code base 
> with
> a simple shim layer. I would rather we stick to the shim layer. I think
> this was easily the best part about hive was that a single release worked
> well regardless of your hadoop version. It was also a key element to 
> hive's
> success. I do not want to see us have multiple branches.
>
>
> Xuefu Zhang <ma...@cloudera.com>
> May 15, 2015 at 22:29
> Thanks for the explanation, Alan!
>
> While I have understood more on the proposal, I actually see more 
> problems than the confusion of two lines of releases. Essentially, 
> this proposal forces a user to make a hard choice between a stabler, 
> legacy-aware release line and an adventurous, pioneering release line. 
> And once the choice is made, there is no easy way back or forward.
>
> Here is my interpretation. Let's say we have two main branches as 
> proposed. I develop a new feature which I think useful for both 
> branches. So, I commit it to both branches. My feature requires 
> additional schema support, so I provide upgrade scripts for both 
> branches. The scripts are different because the two branches have 
> already diverged in schema.
>
> Now the two branches evolve in a diverging fashion like this. This is 
> all good as long as a user stays in his line. The moment the user 
> considers a switch, mostly likely, from branch-1 to branch-2, he is 
> stuck. Why? Because there is no upgrade path from a release in 
> branch-1 to a release in branch-2!
>
> If we want to provide an upgrade path, then there will be MxN paths, 
> where M and N are the number of releases in the two branches, 
> respectively. This is going to be next to a nightmare, not only for 
> users, but also for us.
>
> Also, the proposal will require two sets of things that Hive provides: 
> double documentation, double feature tracking, double build/test 
> infrastructures, etc.
>
> This approach can also potentially cause the problem we saw in hadoop 
> releases, where 0.23 release was greater than 1.0 release.
>
> To me, the problem we are trying to solve is deprecating old things 
> such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I 
> see, however, we approached the problem in less favorable ways.
>
> First, it seemed we wanted to deprecate something just for the sake of 
> deprecation, and it's not based on the rationale that supports the 
> desire. Dev might write code that accidentally break hadoop-1 build. 
> However, this is more a build infrastructure problem rather than the 
> burden of supporting hadoop-1. If our build could catch it at 
> precommit test, then I would think the accident can be well avoided. 
> Most of the times, fixing the build is trivial. And we have already 
> addressed the build infrastructure problem.
>
> Secondly, if we do have a strong reason to deprecate something, we 
> should have a deprecation plan rather than declaring on the spot that 
> the current release is the last one supporting X. I think Microsoft 
> did a better job in terms production deprecation. For instance, they 
> announced long before the last day desupporting Windows XP. In my 
> opinion, we should have a similar vision, giving users, distributions 
> enough time to adjust rather than shocking them with breaking news.
>
> In summary, I do see the need of deprecation in Hive, but I am afraid 
> the way we take, including the proposal here, isn't going to nicely 
> solve the problem. On the contrary, I foresee a spectrum of confusion, 
> frustration, and burden for the user as well as for developers.
>
> Thanks,
> Xuefu
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Chris Drome <cd...@yahoo-inc.com.INVALID>.

I understand the motivation and benefits of creating a branch-2 where more disruptive work can go on without affecting branch-1. While not necessarily against this approach, from Yahoo's standpoint, I do have some questions (concerns).
Upgrading to a new version of Hive requires a significant commitment of time and resources to stabilize and certify a build for deployment to our clusters. Given the size of our clusters and scale of datasets, we have to be particularly careful about adopting new functionality. However, at the same time we are interested in new testing and making available new features and functionality. That said, we would have to rely on branch-1 for the immediate future.
One concern is that branch-1 would be left to stagnate, at which point there would be no option but for users to move to branch-2 as branch-1 would be effectively end-of-lifed. I'm not sure how long this would take, but it would eventually happen as a direct result of the very reason for creating branch-2.
A related concern is how disruptive the code changes will be in branch-2. I imagine that changes in early in branch-2 will be easy to backport to branch-1, while this effort will become more difficult, if not impractical, as time goes. If the code bases diverge too much then this could lead to more pressure for users of branch-1 to add features just to branch-1, which has been mentioned as undesirable. By the same token, backporting any code in branch-2 will require an increasing amount of effort, which contributors to branch-2 may not be interested in committing to.
These questions affect us directly because, while we require a certain amount of stability, we also like to pull in new functionality that will be of value to our users. For example, our current 0.13 release is probably closer to 0.14 at this point. Given the lifespan of a release, it is often more palatable to backport features and bugfixes than to jump to a new version.

The good thing about this proposal is the opportunity to evaluate and clean up alot of the old code.
Thanks,
chris
 


     On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin <se...@hortonworks.com> wrote:
   

 Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.

On 15/5/18, 11:46, "Sergey Shelukhin" <se...@hortonworks.com> wrote:

>I think we need some path for deprecating old Hadoop versions, the same
>way we deprecate old Java version support or old RDBMS version support.
>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
>goes for stuff like MR; supporting it, esp. for perf work, becomes a
>burden, and it’s outdated with 2 alternatives, one of which has been
>around for 2 releases.
>The branches are a graceful way to get rid of the legacy burden.
>
>Alternatively, when sweeping changes are made, we can do what Hbase did
>(which is not pretty imho), where 0.94 version had ~30 dot releases
>because people cannot upgrade to 0.96 “singularity” release.
>
>
>I posit that people who run Hadoop 1 and MR at this day and age (and more
>so as time passes) are people who either don’t care about perf and new
>features, only stability; so, stability-focused branch would be perfect to
>support them.
>
>
>On 15/5/18, 10:04, "Edward Capriolo" <ed...@gmail.com> wrote:
>
>>Up until recently Hive supported numerous versions of Hadoop code base
>>with
>>a simple shim layer. I would rather we stick to the shim layer. I think
>>this was easily the best part about hive was that a single release worked
>>well regardless of your hadoop version. It was also a key element to
>>hive's
>>success. I do not want to see us have multiple branches.
>>
>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
>>
>>> Thanks for the explanation, Alan!
>>>
>>> While I have understood more on the proposal, I actually see more
>>>problems
>>> than the confusion of two lines of releases. Essentially, this proposal
>>> forces a user to make a hard choice between a stabler, legacy-aware
>>>release
>>> line and an adventurous, pioneering release line. And once the choice
>>>is
>>> made, there is no easy way back or forward.
>>>
>>> Here is my interpretation. Let's say we have two main branches as
>>> proposed. I develop a new feature which I think useful for both
>>>branches.
>>> So, I commit it to both branches. My feature requires additional schema
>>> support, so I provide upgrade scripts for both branches. The scripts
>>>are
>>> different because the two branches have already diverged in schema.
>>>
>>> Now the two branches evolve in a diverging fashion like this. This is
>>>all
>>> good as long as a user stays in his line. The moment the user considers
>>>a
>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
>>>Because
>>> there is no upgrade path from a release in branch-1 to a release in
>>> branch-2!
>>>
>>> If we want to provide an upgrade path, then there will be MxN paths,
>>>where
>>> M and N are the number of releases in the two branches, respectively.
>>>This
>>> is going to be next to a nightmare, not only for users, but also for
>>>us.
>>>
>>> Also, the proposal will require two sets of things that Hive provides:
>>> double documentation, double feature tracking, double build/test
>>> infrastructures, etc.
>>>
>>> This approach can also potentially cause the problem we saw in hadoop
>>> releases, where 0.23 release was greater than 1.0 release.
>>>
>>> To me, the problem we are trying to solve is deprecating old things
>>>such
>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
>>> however, we approached the problem in less favorable ways.
>>>
>>> First, it seemed we wanted to deprecate something just for the sake of
>>> deprecation, and it's not based on the rationale that supports the
>>>desire.
>>> Dev might write code that accidentally break hadoop-1 build. However,
>>>this
>>> is more a build infrastructure problem rather than the burden of
>>>supporting
>>> hadoop-1. If our build could catch it at precommit test, then I would
>>>think
>>> the accident can be well avoided. Most of the times, fixing the build
>>>is
>>> trivial. And we have already addressed the build infrastructure
>>>problem.
>>>
>>> Secondly, if we do have a strong reason to deprecate something, we
>>>should
>>> have a deprecation plan rather than declaring on the spot that the
>>>current
>>> release is the last one supporting X. I think Microsoft did a better
>>>job in
>>> terms production deprecation. For instance, they announced long before
>>>the
>>> last day desupporting Windows XP. In my opinion, we should have a
>>>similar
>>> vision, giving users, distributions enough time to adjust rather than
>>> shocking them with breaking news.
>>>
>>> In summary, I do see the need of deprecation in Hive, but I am afraid
>>>the
>>> way we take, including the proposal here, isn't going to nicely solve
>>>the
>>> problem. On the contrary, I foresee a spectrum of confusion,
>>>frustration,
>>> and burden for the user as well as for developers.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com>
>>>wrote:
>>>
>>>>
>>>>
>>>>  Xuefu Zhang <xz...@cloudera.com>
>>>>  May 15, 2015 at 17:31
>>>>
>>>> Just make sure that I understand the proposal correctly: we are going
>>>>to
>>>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>>>
>>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
>>>>is
>>>> already well established.
>>>>
>>>>  New features
>>>> are only merged to branch-2. That essentially says we stop development
>>>>for
>>>> hadoop-1, right?
>>>>
>>>>  If developers want to keep contributing patches to branch-1 then
>>>> there's no need for it to stop.  We would want to avoid putting new
>>>> features only on branch-1, unless they only made sense in that
>>>>context.
>>>> But I assume we'll see people contributing to branch-1 for some time.
>>>>
>>>>  Are we also making two lines of releases: ene for branch-1
>>>> and one for branch-2? Won't that be confusing and also burdensome if
>>>>we
>>>> release say 1.3, 2.0, 2.1, 1.4...
>>>>
>>>>  I'm asserting that it will be less confusing than the alternatives.
>>>>We
>>>> need some way to make early releases of many of the new features.  I
>>>> believe that this proposal is less confusing than if we start putting
>>>>the
>>>> new features in 1.x branches.  This is particularly true because it
>>>>would
>>>> help us to start being able to drop older functionality like Hadoop-1
>>>>and
>>>> MapReduce, which is very hard to do in the 1.x line without stranding
>>>>users.
>>>>
>>>>  Please note that we will have hadoop 3 soon. What's the story there?
>>>>
>>>>  As I said above, I don't see this as tied to Hadoop versions.
>>>>
>>>> Alan.
>>>>
>>>>  Thanks,
>>>> Xuefu
>>>>
>>>>
>>>>
>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
>>>><vgumashta@hortonworks.com
>>>>
>>>> wrote:
>>>>
>>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>>> important changes.
>>>>
>>>>  —Vaibhav
>>>>
>>>>  From: Alan Gates <al...@gmail.com> <al...@gmail.com>
>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>><de...@hive.apache.org> <de...@hive.apache.org>
>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org>
>>>><de...@hive.apache.org>
>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>
>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>week.
>>>>
>>>> Alan.
>>>>
>>>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
>>>> May 14, 2015 at 10:44
>>>>  Hi,
>>>>
>>>> +1 on the idea.
>>>>
>>>> Having a stable release branch with ongoing fixes where we do not drop
>>>> major features would be good all around.
>>>>
>>>> It lets us accelerate the pace of development, drop major features or
>>>> rewrite them entirely without dragging everyone else kicking &
>>>>screaming
>>>> into that release.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
>>>> May 11, 2015 at 19:17
>>>>  That sounds like a good idea.
>>>> Some features could be back ported to branch-1 if viable, but at least
>>>>new
>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>> Probably also a good place to enable vectorization and other perf
>>>>features
>>>> by default while we make alpha releases.
>>>>
>>>> +1
>>>>
>>>>
>>>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
>>>> May 11, 2015 at 15:38
>>>>  There is a lot of forward-looking work going on in various branches
>>>>of
>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>would
>>>> be good to have a way to release this code to users so that they can
>>>> experiment with it.  Releasing it will also provide feedback to
>>>>developers.
>>>>
>>>> At the same time there are discussions on whether to keep supporting
>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>such as
>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>
>>>> I propose that the best way to deal with this would be to make a
>>>> branch-1.  We could continue to make new feature releases off of this
>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>functionality.
>>>> This provides stability and continuity for users and developers.
>>>>
>>>> We could then merge these new features branches (LLAP, HBase
>>>>metastore,
>>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>>such
>>>> as the vectorization and ACID.  We could also drop older, less used
>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>while
>>>> before we are ready to make stable, production ready releases of this
>>>> code.  But we could start making alpha quality releases soon.  We
>>>>would
>>>> call these releases 2.x, to stress the non-backward compatible changes
>>>>such
>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>new
>>>> code and developers a chance to get feedback.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>  Vaibhav Gumashta <vg...@hortonworks.com>
>>>>  May 15, 2015 at 16:43
>>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>>> important changes.
>>>>
>>>>  —Vaibhav
>>>>
>>>>  From: Alan Gates <al...@gmail.com>
>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>> To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>
>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>week.
>>>>
>>>> Alan.
>>>>
>>>>    Gopal Vijayaraghavan <go...@apache.org>
>>>>  May 14, 2015 at 10:44
>>>> Hi,
>>>>
>>>> +1 on the idea.
>>>>
>>>> Having a stable release branch with ongoing fixes where we do not drop
>>>> major features would be good all around.
>>>>
>>>> It lets us accelerate the pace of development, drop major features or
>>>> rewrite them entirely without dragging everyone else kicking &
>>>>screaming
>>>> into that release.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>>  Sergey Shelukhin <se...@hortonworks.com>
>>>>  May 11, 2015 at 19:17
>>>> That sounds like a good idea.
>>>> Some features could be back ported to branch-1 if viable, but at least
>>>>new
>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>> Probably also a good place to enable vectorization and other perf
>>>>features
>>>> by default while we make alpha releases.
>>>>
>>>> +1
>>>>
>>>>
>>>>  Alan Gates <al...@gmail.com>
>>>>  May 11, 2015 at 15:38
>>>> There is a lot of forward-looking work going on in various branches of
>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>would
>>>> be good to have a way to release this code to users so that they can
>>>> experiment with it.  Releasing it will also provide feedback to
>>>>developers.
>>>>
>>>> At the same time there are discussions on whether to keep supporting
>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>such as
>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>
>>>> I propose that the best way to deal with this would be to make a
>>>> branch-1.  We could continue to make new feature releases off of this
>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>functionality.
>>>> This provides stability and continuity for users and developers.
>>>>
>>>> We could then merge these new features branches (LLAP, HBase
>>>>metastore,
>>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>>such
>>>> as the vectorization and ACID.  We could also drop older, less used
>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>while
>>>> before we are ready to make stable, production ready releases of this
>>>> code.  But we could start making alpha quality releases soon.  We
>>>>would
>>>> call these releases 2.x, to stress the non-backward compatible changes
>>>>such
>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>new
>>>> code and developers a chance to get feedback.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Sergey Shelukhin <se...@hortonworks.com>.

Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.

On 15/5/18, 11:46, "Sergey Shelukhin" <se...@hortonworks.com> wrote:

>I think we need some path for deprecating old Hadoop versions, the same
>way we deprecate old Java version support or old RDBMS version support.
>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
>goes for stuff like MR; supporting it, esp. for perf work, becomes a
>burden, and it’s outdated with 2 alternatives, one of which has been
>around for 2 releases.
>The branches are a graceful way to get rid of the legacy burden.
>
>Alternatively, when sweeping changes are made, we can do what Hbase did
>(which is not pretty imho), where 0.94 version had ~30 dot releases
>because people cannot upgrade to 0.96 “singularity” release.
>
>
>I posit that people who run Hadoop 1 and MR at this day and age (and more
>so as time passes) are people who either don’t care about perf and new
>features, only stability; so, stability-focused branch would be perfect to
>support them.
>
>
>On 15/5/18, 10:04, "Edward Capriolo" <ed...@gmail.com> wrote:
>
>>Up until recently Hive supported numerous versions of Hadoop code base
>>with
>>a simple shim layer. I would rather we stick to the shim layer. I think
>>this was easily the best part about hive was that a single release worked
>>well regardless of your hadoop version. It was also a key element to
>>hive's
>>success. I do not want to see us have multiple branches.
>>
>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
>>
>>> Thanks for the explanation, Alan!
>>>
>>> While I have understood more on the proposal, I actually see more
>>>problems
>>> than the confusion of two lines of releases. Essentially, this proposal
>>> forces a user to make a hard choice between a stabler, legacy-aware
>>>release
>>> line and an adventurous, pioneering release line. And once the choice
>>>is
>>> made, there is no easy way back or forward.
>>>
>>> Here is my interpretation. Let's say we have two main branches as
>>> proposed. I develop a new feature which I think useful for both
>>>branches.
>>> So, I commit it to both branches. My feature requires additional schema
>>> support, so I provide upgrade scripts for both branches. The scripts
>>>are
>>> different because the two branches have already diverged in schema.
>>>
>>> Now the two branches evolve in a diverging fashion like this. This is
>>>all
>>> good as long as a user stays in his line. The moment the user considers
>>>a
>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
>>>Because
>>> there is no upgrade path from a release in branch-1 to a release in
>>> branch-2!
>>>
>>> If we want to provide an upgrade path, then there will be MxN paths,
>>>where
>>> M and N are the number of releases in the two branches, respectively.
>>>This
>>> is going to be next to a nightmare, not only for users, but also for
>>>us.
>>>
>>> Also, the proposal will require two sets of things that Hive provides:
>>> double documentation, double feature tracking, double build/test
>>> infrastructures, etc.
>>>
>>> This approach can also potentially cause the problem we saw in hadoop
>>> releases, where 0.23 release was greater than 1.0 release.
>>>
>>> To me, the problem we are trying to solve is deprecating old things
>>>such
>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
>>> however, we approached the problem in less favorable ways.
>>>
>>> First, it seemed we wanted to deprecate something just for the sake of
>>> deprecation, and it's not based on the rationale that supports the
>>>desire.
>>> Dev might write code that accidentally break hadoop-1 build. However,
>>>this
>>> is more a build infrastructure problem rather than the burden of
>>>supporting
>>> hadoop-1. If our build could catch it at precommit test, then I would
>>>think
>>> the accident can be well avoided. Most of the times, fixing the build
>>>is
>>> trivial. And we have already addressed the build infrastructure
>>>problem.
>>>
>>> Secondly, if we do have a strong reason to deprecate something, we
>>>should
>>> have a deprecation plan rather than declaring on the spot that the
>>>current
>>> release is the last one supporting X. I think Microsoft did a better
>>>job in
>>> terms production deprecation. For instance, they announced long before
>>>the
>>> last day desupporting Windows XP. In my opinion, we should have a
>>>similar
>>> vision, giving users, distributions enough time to adjust rather than
>>> shocking them with breaking news.
>>>
>>> In summary, I do see the need of deprecation in Hive, but I am afraid
>>>the
>>> way we take, including the proposal here, isn't going to nicely solve
>>>the
>>> problem. On the contrary, I foresee a spectrum of confusion,
>>>frustration,
>>> and burden for the user as well as for developers.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com>
>>>wrote:
>>>
>>>>
>>>>
>>>>   Xuefu Zhang <xz...@cloudera.com>
>>>>  May 15, 2015 at 17:31
>>>>
>>>> Just make sure that I understand the proposal correctly: we are going
>>>>to
>>>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>>>
>>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
>>>>is
>>>> already well established.
>>>>
>>>>  New features
>>>> are only merged to branch-2. That essentially says we stop development
>>>>for
>>>> hadoop-1, right?
>>>>
>>>>  If developers want to keep contributing patches to branch-1 then
>>>> there's no need for it to stop.  We would want to avoid putting new
>>>> features only on branch-1, unless they only made sense in that
>>>>context.
>>>> But I assume we'll see people contributing to branch-1 for some time.
>>>>
>>>>  Are we also making two lines of releases: ene for branch-1
>>>> and one for branch-2? Won't that be confusing and also burdensome if
>>>>we
>>>> release say 1.3, 2.0, 2.1, 1.4...
>>>>
>>>>  I'm asserting that it will be less confusing than the alternatives.
>>>>We
>>>> need some way to make early releases of many of the new features.  I
>>>> believe that this proposal is less confusing than if we start putting
>>>>the
>>>> new features in 1.x branches.  This is particularly true because it
>>>>would
>>>> help us to start being able to drop older functionality like Hadoop-1
>>>>and
>>>> MapReduce, which is very hard to do in the 1.x line without stranding
>>>>users.
>>>>
>>>>  Please note that we will have hadoop 3 soon. What's the story there?
>>>>
>>>>  As I said above, I don't see this as tied to Hadoop versions.
>>>>
>>>> Alan.
>>>>
>>>>  Thanks,
>>>> Xuefu
>>>>
>>>>
>>>>
>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
>>>><vgumashta@hortonworks.com
>>>>
>>>> wrote:
>>>>
>>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>>> important changes.
>>>>
>>>>  —Vaibhav
>>>>
>>>>   From: Alan Gates <al...@gmail.com> <al...@gmail.com>
>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>><de...@hive.apache.org> <de...@hive.apache.org>
>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org>
>>>><de...@hive.apache.org>
>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>
>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>week.
>>>>
>>>> Alan.
>>>>
>>>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
>>>> May 14, 2015 at 10:44
>>>>   Hi,
>>>>
>>>> +1 on the idea.
>>>>
>>>> Having a stable release branch with ongoing fixes where we do not drop
>>>> major features would be good all around.
>>>>
>>>> It lets us accelerate the pace of development, drop major features or
>>>> rewrite them entirely without dragging everyone else kicking &
>>>>screaming
>>>> into that release.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
>>>> May 11, 2015 at 19:17
>>>>   That sounds like a good idea.
>>>> Some features could be back ported to branch-1 if viable, but at least
>>>>new
>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>> Probably also a good place to enable vectorization and other perf
>>>>features
>>>> by default while we make alpha releases.
>>>>
>>>> +1
>>>>
>>>>
>>>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
>>>> May 11, 2015 at 15:38
>>>>   There is a lot of forward-looking work going on in various branches
>>>>of
>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>would
>>>> be good to have a way to release this code to users so that they can
>>>> experiment with it.  Releasing it will also provide feedback to
>>>>developers.
>>>>
>>>> At the same time there are discussions on whether to keep supporting
>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>such as
>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>
>>>> I propose that the best way to deal with this would be to make a
>>>> branch-1.  We could continue to make new feature releases off of this
>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>functionality.
>>>> This provides stability and continuity for users and developers.
>>>>
>>>> We could then merge these new features branches (LLAP, HBase
>>>>metastore,
>>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>>such
>>>> as the vectorization and ACID.  We could also drop older, less used
>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>while
>>>> before we are ready to make stable, production ready releases of this
>>>> code.  But we could start making alpha quality releases soon.  We
>>>>would
>>>> call these releases 2.x, to stress the non-backward compatible changes
>>>>such
>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>new
>>>> code and developers a chance to get feedback.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>   Vaibhav Gumashta <vg...@hortonworks.com>
>>>>  May 15, 2015 at 16:43
>>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>>> important changes.
>>>>
>>>>  —Vaibhav
>>>>
>>>>   From: Alan Gates <al...@gmail.com>
>>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>> To: "dev@hive.apache.org" <de...@hive.apache.org>
>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>
>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>week.
>>>>
>>>> Alan.
>>>>
>>>>     Gopal Vijayaraghavan <go...@apache.org>
>>>>  May 14, 2015 at 10:44
>>>> Hi,
>>>>
>>>> +1 on the idea.
>>>>
>>>> Having a stable release branch with ongoing fixes where we do not drop
>>>> major features would be good all around.
>>>>
>>>> It lets us accelerate the pace of development, drop major features or
>>>> rewrite them entirely without dragging everyone else kicking &
>>>>screaming
>>>> into that release.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>>   Sergey Shelukhin <se...@hortonworks.com>
>>>>  May 11, 2015 at 19:17
>>>> That sounds like a good idea.
>>>> Some features could be back ported to branch-1 if viable, but at least
>>>>new
>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>> Probably also a good place to enable vectorization and other perf
>>>>features
>>>> by default while we make alpha releases.
>>>>
>>>> +1
>>>>
>>>>
>>>>   Alan Gates <al...@gmail.com>
>>>>  May 11, 2015 at 15:38
>>>> There is a lot of forward-looking work going on in various branches of
>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>would
>>>> be good to have a way to release this code to users so that they can
>>>> experiment with it.  Releasing it will also provide feedback to
>>>>developers.
>>>>
>>>> At the same time there are discussions on whether to keep supporting
>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>such as
>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>
>>>> I propose that the best way to deal with this would be to make a
>>>> branch-1.  We could continue to make new feature releases off of this
>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>functionality.
>>>> This provides stability and continuity for users and developers.
>>>>
>>>> We could then merge these new features branches (LLAP, HBase
>>>>metastore,
>>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>>such
>>>> as the vectorization and ACID.  We could also drop older, less used
>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>while
>>>> before we are ready to make stable, production ready releases of this
>>>> code.  But we could start making alpha quality releases soon.  We
>>>>would
>>>> call these releases 2.x, to stress the non-backward compatible changes
>>>>such
>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>new
>>>> code and developers a chance to get feedback.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Sergey Shelukhin <se...@hortonworks.com>.

I think we need some path for deprecating old Hadoop versions, the same
way we deprecate old Java version support or old RDBMS version support.
At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
goes for stuff like MR; supporting it, esp. for perf work, becomes a
burden, and it’s outdated with 2 alternatives, one of which has been
around for 2 releases.
The branches are a graceful way to get rid of the legacy burden.

Alternatively, when sweeping changes are made, we can do what Hbase did
(which is not pretty imho), where 0.94 version had ~30 dot releases
because people cannot upgrade to 0.96 “singularity” release.


I posit that people who run Hadoop 1 and MR at this day and age (and more
so as time passes) are people who either don’t care about perf and new
features, only stability; so, stability-focused branch would be perfect to
support them.


On 15/5/18, 10:04, "Edward Capriolo" <ed...@gmail.com> wrote:

>Up until recently Hive supported numerous versions of Hadoop code base
>with
>a simple shim layer. I would rather we stick to the shim layer. I think
>this was easily the best part about hive was that a single release worked
>well regardless of your hadoop version. It was also a key element to
>hive's
>success. I do not want to see us have multiple branches.
>
>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
>
>> Thanks for the explanation, Alan!
>>
>> While I have understood more on the proposal, I actually see more
>>problems
>> than the confusion of two lines of releases. Essentially, this proposal
>> forces a user to make a hard choice between a stabler, legacy-aware
>>release
>> line and an adventurous, pioneering release line. And once the choice is
>> made, there is no easy way back or forward.
>>
>> Here is my interpretation. Let's say we have two main branches as
>> proposed. I develop a new feature which I think useful for both
>>branches.
>> So, I commit it to both branches. My feature requires additional schema
>> support, so I provide upgrade scripts for both branches. The scripts are
>> different because the two branches have already diverged in schema.
>>
>> Now the two branches evolve in a diverging fashion like this. This is
>>all
>> good as long as a user stays in his line. The moment the user considers
>>a
>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
>>Because
>> there is no upgrade path from a release in branch-1 to a release in
>> branch-2!
>>
>> If we want to provide an upgrade path, then there will be MxN paths,
>>where
>> M and N are the number of releases in the two branches, respectively.
>>This
>> is going to be next to a nightmare, not only for users, but also for us.
>>
>> Also, the proposal will require two sets of things that Hive provides:
>> double documentation, double feature tracking, double build/test
>> infrastructures, etc.
>>
>> This approach can also potentially cause the problem we saw in hadoop
>> releases, where 0.23 release was greater than 1.0 release.
>>
>> To me, the problem we are trying to solve is deprecating old things such
>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
>> however, we approached the problem in less favorable ways.
>>
>> First, it seemed we wanted to deprecate something just for the sake of
>> deprecation, and it's not based on the rationale that supports the
>>desire.
>> Dev might write code that accidentally break hadoop-1 build. However,
>>this
>> is more a build infrastructure problem rather than the burden of
>>supporting
>> hadoop-1. If our build could catch it at precommit test, then I would
>>think
>> the accident can be well avoided. Most of the times, fixing the build is
>> trivial. And we have already addressed the build infrastructure problem.
>>
>> Secondly, if we do have a strong reason to deprecate something, we
>>should
>> have a deprecation plan rather than declaring on the spot that the
>>current
>> release is the last one supporting X. I think Microsoft did a better
>>job in
>> terms production deprecation. For instance, they announced long before
>>the
>> last day desupporting Windows XP. In my opinion, we should have a
>>similar
>> vision, giving users, distributions enough time to adjust rather than
>> shocking them with breaking news.
>>
>> In summary, I do see the need of deprecation in Hive, but I am afraid
>>the
>> way we take, including the proposal here, isn't going to nicely solve
>>the
>> problem. On the contrary, I foresee a spectrum of confusion,
>>frustration,
>> and burden for the user as well as for developers.
>>
>> Thanks,
>> Xuefu
>>
>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com>
>>wrote:
>>
>>>
>>>
>>>   Xuefu Zhang <xz...@cloudera.com>
>>>  May 15, 2015 at 17:31
>>>
>>> Just make sure that I understand the proposal correctly: we are going
>>>to
>>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>>
>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
>>>is
>>> already well established.
>>>
>>>  New features
>>> are only merged to branch-2. That essentially says we stop development
>>>for
>>> hadoop-1, right?
>>>
>>>  If developers want to keep contributing patches to branch-1 then
>>> there's no need for it to stop.  We would want to avoid putting new
>>> features only on branch-1, unless they only made sense in that context.
>>> But I assume we'll see people contributing to branch-1 for some time.
>>>
>>>  Are we also making two lines of releases: ene for branch-1
>>> and one for branch-2? Won't that be confusing and also burdensome if we
>>> release say 1.3, 2.0, 2.1, 1.4...
>>>
>>>  I'm asserting that it will be less confusing than the alternatives.
>>>We
>>> need some way to make early releases of many of the new features.  I
>>> believe that this proposal is less confusing than if we start putting
>>>the
>>> new features in 1.x branches.  This is particularly true because it
>>>would
>>> help us to start being able to drop older functionality like Hadoop-1
>>>and
>>> MapReduce, which is very hard to do in the 1.x line without stranding
>>>users.
>>>
>>>  Please note that we will have hadoop 3 soon. What's the story there?
>>>
>>>  As I said above, I don't see this as tied to Hadoop versions.
>>>
>>> Alan.
>>>
>>>  Thanks,
>>> Xuefu
>>>
>>>
>>>
>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
>>><vgumashta@hortonworks.com
>>>
>>> wrote:
>>>
>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>> important changes.
>>>
>>>  —Vaibhav
>>>
>>>   From: Alan Gates <al...@gmail.com> <al...@gmail.com>
>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>><de...@hive.apache.org> <de...@hive.apache.org>
>>> Date: Friday, May 15, 2015 at 4:11 PM
>>> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org>
>>><de...@hive.apache.org>
>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>
>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>week.
>>>
>>> Alan.
>>>
>>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
>>> May 14, 2015 at 10:44
>>>   Hi,
>>>
>>> +1 on the idea.
>>>
>>> Having a stable release branch with ongoing fixes where we do not drop
>>> major features would be good all around.
>>>
>>> It lets us accelerate the pace of development, drop major features or
>>> rewrite them entirely without dragging everyone else kicking &
>>>screaming
>>> into that release.
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
>>> May 11, 2015 at 19:17
>>>   That sounds like a good idea.
>>> Some features could be back ported to branch-1 if viable, but at least
>>>new
>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>> Probably also a good place to enable vectorization and other perf
>>>features
>>> by default while we make alpha releases.
>>>
>>> +1
>>>
>>>
>>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
>>> May 11, 2015 at 15:38
>>>   There is a lot of forward-looking work going on in various branches
>>>of
>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>would
>>> be good to have a way to release this code to users so that they can
>>> experiment with it.  Releasing it will also provide feedback to
>>>developers.
>>>
>>> At the same time there are discussions on whether to keep supporting
>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>such as
>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>
>>> I propose that the best way to deal with this would be to make a
>>> branch-1.  We could continue to make new feature releases off of this
>>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>> This provides stability and continuity for users and developers.
>>>
>>> We could then merge these new features branches (LLAP, HBase metastore,
>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>such
>>> as the vectorization and ACID.  We could also drop older, less used
>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>while
>>> before we are ready to make stable, production ready releases of this
>>> code.  But we could start making alpha quality releases soon.  We would
>>> call these releases 2.x, to stress the non-backward compatible changes
>>>such
>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>new
>>> code and developers a chance to get feedback.
>>>
>>> Thoughts?
>>>
>>>
>>>
>>>   Vaibhav Gumashta <vg...@hortonworks.com>
>>>  May 15, 2015 at 16:43
>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>> important changes.
>>>
>>>  —Vaibhav
>>>
>>>   From: Alan Gates <al...@gmail.com>
>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>> Date: Friday, May 15, 2015 at 4:11 PM
>>> To: "dev@hive.apache.org" <de...@hive.apache.org>
>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>
>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>week.
>>>
>>> Alan.
>>>
>>>     Gopal Vijayaraghavan <go...@apache.org>
>>>  May 14, 2015 at 10:44
>>> Hi,
>>>
>>> +1 on the idea.
>>>
>>> Having a stable release branch with ongoing fixes where we do not drop
>>> major features would be good all around.
>>>
>>> It lets us accelerate the pace of development, drop major features or
>>> rewrite them entirely without dragging everyone else kicking &
>>>screaming
>>> into that release.
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>>   Sergey Shelukhin <se...@hortonworks.com>
>>>  May 11, 2015 at 19:17
>>> That sounds like a good idea.
>>> Some features could be back ported to branch-1 if viable, but at least
>>>new
>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>> Probably also a good place to enable vectorization and other perf
>>>features
>>> by default while we make alpha releases.
>>>
>>> +1
>>>
>>>
>>>   Alan Gates <al...@gmail.com>
>>>  May 11, 2015 at 15:38
>>> There is a lot of forward-looking work going on in various branches of
>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>would
>>> be good to have a way to release this code to users so that they can
>>> experiment with it.  Releasing it will also provide feedback to
>>>developers.
>>>
>>> At the same time there are discussions on whether to keep supporting
>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>such as
>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>
>>> I propose that the best way to deal with this would be to make a
>>> branch-1.  We could continue to make new feature releases off of this
>>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>> This provides stability and continuity for users and developers.
>>>
>>> We could then merge these new features branches (LLAP, HBase metastore,
>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>such
>>> as the vectorization and ACID.  We could also drop older, less used
>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>while
>>> before we are ready to make stable, production ready releases of this
>>> code.  But we could start making alpha quality releases soon.  We would
>>> call these releases 2.x, to stress the non-backward compatible changes
>>>such
>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>new
>>> code and developers a chance to get feedback.
>>>
>>> Thoughts?
>>>
>>>
>>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Owen O'Malley <om...@apache.org>.

I think that it is past time for Hive to have a "stable" and "next" branch.
Every release from Hive 0.11 to Hive 1.2 has been a major release in terms
of changes and functionality. Part of what we've been missing is a way of
making stable releases that don't move as fast and supports the customers
with minor new features, but no big sweeping changes. That will be a win
for users.

I'm +1 on Alan's plan of making a new release branch.

.. Owen

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.


> Edward Capriolo <ma...@gmail.com>
> May 18, 2015 at 10:14
> This concept of "experimental features" basically translates to "I do not
> have the time to care about people not using my version".
No, it does not.  Continuing to support old features is a cost/benefit 
trade off, both for developers and users.  The cost for developers is 
continuing to work around older code, the cost for users that they get 
less new features, less performance improvements, less stability 
improvements because developers are spending time working around the old 
code.

At some point in the cost/benefit analysis the costs are high enough 
that it makes sense to stop supporting it.  I am asserting that we are 
at that point.

Caring about people not on the latest version is an important part of 
what I am proposing.  There are still many users using Hive either on 
Hadoop 1 or for more traditional Hive workloads (batch, ETL).  It is 
important to give these users a good path forward.  My assertion is that 
a branch-1 is the best way to do this.

So to continue in the cost/benefit paradigm, what I have proposed does 
have an additional cost for developers.  As I have said in my responses 
to Xuefu, I don't think these are too bad, and I assert that they are 
less than continuing to carry forward older functionality ad infinitum.  
My intent is that for users who are not interested in new features or 
workloads the cost is at or near zero.  Customers interested in newer 
functionality will continue to have pay the cost of upgrades, but that 
is true anyway.

Alan.

> I do not see it
> as good. We have seen what happened to upstream hadoop there was this gap
> between 0.21 , and ??.....??. No one was clear what the API was (mapred,
> new mapreduce), no one know what to link off of cdh?, vanilla?, yahoo
> distribution?.
>
> IMHO. This is just going to increase fragmentation.
>
> On Mon, May 18, 2015 at 1:04 PM, Edward Capriolo <ed...@gmail.com>
>
> Edward Capriolo <ma...@gmail.com>
> May 18, 2015 at 10:04
> Up until recently Hive supported numerous versions of Hadoop code base 
> with
> a simple shim layer. I would rather we stick to the shim layer. I think
> this was easily the best part about hive was that a single release worked
> well regardless of your hadoop version. It was also a key element to 
> hive's
> success. I do not want to see us have multiple branches.
>
>
> Xuefu Zhang <ma...@cloudera.com>
> May 15, 2015 at 22:29
> Thanks for the explanation, Alan!
>
> While I have understood more on the proposal, I actually see more 
> problems than the confusion of two lines of releases. Essentially, 
> this proposal forces a user to make a hard choice between a stabler, 
> legacy-aware release line and an adventurous, pioneering release line. 
> And once the choice is made, there is no easy way back or forward.
>
> Here is my interpretation. Let's say we have two main branches as 
> proposed. I develop a new feature which I think useful for both 
> branches. So, I commit it to both branches. My feature requires 
> additional schema support, so I provide upgrade scripts for both 
> branches. The scripts are different because the two branches have 
> already diverged in schema.
>
> Now the two branches evolve in a diverging fashion like this. This is 
> all good as long as a user stays in his line. The moment the user 
> considers a switch, mostly likely, from branch-1 to branch-2, he is 
> stuck. Why? Because there is no upgrade path from a release in 
> branch-1 to a release in branch-2!
>
> If we want to provide an upgrade path, then there will be MxN paths, 
> where M and N are the number of releases in the two branches, 
> respectively. This is going to be next to a nightmare, not only for 
> users, but also for us.
>
> Also, the proposal will require two sets of things that Hive provides: 
> double documentation, double feature tracking, double build/test 
> infrastructures, etc.
>
> This approach can also potentially cause the problem we saw in hadoop 
> releases, where 0.23 release was greater than 1.0 release.
>
> To me, the problem we are trying to solve is deprecating old things 
> such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I 
> see, however, we approached the problem in less favorable ways.
>
> First, it seemed we wanted to deprecate something just for the sake of 
> deprecation, and it's not based on the rationale that supports the 
> desire. Dev might write code that accidentally break hadoop-1 build. 
> However, this is more a build infrastructure problem rather than the 
> burden of supporting hadoop-1. If our build could catch it at 
> precommit test, then I would think the accident can be well avoided. 
> Most of the times, fixing the build is trivial. And we have already 
> addressed the build infrastructure problem.
>
> Secondly, if we do have a strong reason to deprecate something, we 
> should have a deprecation plan rather than declaring on the spot that 
> the current release is the last one supporting X. I think Microsoft 
> did a better job in terms production deprecation. For instance, they 
> announced long before the last day desupporting Windows XP. In my 
> opinion, we should have a similar vision, giving users, distributions 
> enough time to adjust rather than shocking them with breaking news.
>
> In summary, I do see the need of deprecation in Hive, but I am afraid 
> the way we take, including the proposal here, isn't going to nicely 
> solve the problem. On the contrary, I foresee a spectrum of confusion, 
> frustration, and burden for the user as well as for developers.
>
> Thanks,
> Xuefu
>
>
> Xuefu Zhang <ma...@cloudera.com>
> May 15, 2015 at 17:31
> Just make sure that I understand the proposal correctly: we are going to
> have two main branches, one for hadoop-1 and one for hadoop-2. New features
> are only merged to branch-2. That essentially says we stop development for
> hadoop-1, right? Are we also making two lines of releases: ene for branch-1
> and one for branch-2? Won't that be confusing and also burdensome if we
> release say 1.3, 2.0, 2.1, 1.4...
>
> Please note that we will have hadoop 3 soon. What's the story there?
>
> Thanks,
> Xuefu
>
>
>
> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta<vgumashta@hortonworks.com
>> wrote:
>
>>   +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>   —Vaibhav
>>
>>    From: Alan Gates<al...@gmail.com>
>> Reply-To: "dev@hive.apache.org"<de...@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org"<de...@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>   Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>     Gopal Vijayaraghavan<go...@apache.org>
>> May 14, 2015 at 10:44
>>    Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking&  screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>     Sergey Shelukhin<se...@hortonworks.com>
>> May 11, 2015 at 19:17
>>    That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>     Alan Gates<al...@gmail.com>
>> May 11, 2015 at 15:38
>>    There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>
> Vaibhav Gumashta <ma...@hortonworks.com>
> May 15, 2015 at 16:43
> +1 on the new branch. I think it’ll help in faster dev time for these 
> important changes.
>
> —Vaibhav
>
> From: Alan Gates <alanfgates@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@hive.apache.org <ma...@hive.apache.org>" 
> <dev@hive.apache.org <ma...@hive.apache.org>>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org <ma...@hive.apache.org>" 
> <dev@hive.apache.org <ma...@hive.apache.org>>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
> Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Edward Capriolo <ed...@gmail.com>.

This concept of "experimental features" basically translates to "I do not
have the time to care about people not using my version". I do not see it
as good. We have seen what happened to upstream hadoop there was this gap
between 0.21 , and ??.....??. No one was clear what the API was (mapred,
new mapreduce), no one know what to link off of cdh?, vanilla?, yahoo
distribution?.

IMHO. This is just going to increase fragmentation.

On Mon, May 18, 2015 at 1:04 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> Up until recently Hive supported numerous versions of Hadoop code base
> with a simple shim layer. I would rather we stick to the shim layer. I
> think this was easily the best part about hive was that a single release
> worked well regardless of your hadoop version. It was also a key element to
> hive's success. I do not want to see us have multiple branches.
>
> On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
>
>> Thanks for the explanation, Alan!
>>
>> While I have understood more on the proposal, I actually see more
>> problems than the confusion of two lines of releases. Essentially, this
>> proposal forces a user to make a hard choice between a stabler,
>> legacy-aware release line and an adventurous, pioneering release line. And
>> once the choice is made, there is no easy way back or forward.
>>
>> Here is my interpretation. Let's say we have two main branches as
>> proposed. I develop a new feature which I think useful for both branches.
>> So, I commit it to both branches. My feature requires additional schema
>> support, so I provide upgrade scripts for both branches. The scripts are
>> different because the two branches have already diverged in schema.
>>
>> Now the two branches evolve in a diverging fashion like this. This is all
>> good as long as a user stays in his line. The moment the user considers a
>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because
>> there is no upgrade path from a release in branch-1 to a release in
>> branch-2!
>>
>> If we want to provide an upgrade path, then there will be MxN paths,
>> where M and N are the number of releases in the two branches, respectively.
>> This is going to be next to a nightmare, not only for users, but also for
>> us.
>>
>> Also, the proposal will require two sets of things that Hive provides:
>> double documentation, double feature tracking, double build/test
>> infrastructures, etc.
>>
>> This approach can also potentially cause the problem we saw in hadoop
>> releases, where 0.23 release was greater than 1.0 release.
>>
>> To me, the problem we are trying to solve is deprecating old things such
>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
>> however, we approached the problem in less favorable ways.
>>
>> First, it seemed we wanted to deprecate something just for the sake of
>> deprecation, and it's not based on the rationale that supports the desire.
>> Dev might write code that accidentally break hadoop-1 build. However, this
>> is more a build infrastructure problem rather than the burden of supporting
>> hadoop-1. If our build could catch it at precommit test, then I would think
>> the accident can be well avoided. Most of the times, fixing the build is
>> trivial. And we have already addressed the build infrastructure problem.
>>
>> Secondly, if we do have a strong reason to deprecate something, we should
>> have a deprecation plan rather than declaring on the spot that the current
>> release is the last one supporting X. I think Microsoft did a better job in
>> terms production deprecation. For instance, they announced long before the
>> last day desupporting Windows XP. In my opinion, we should have a similar
>> vision, giving users, distributions enough time to adjust rather than
>> shocking them with breaking news.
>>
>> In summary, I do see the need of deprecation in Hive, but I am afraid the
>> way we take, including the proposal here, isn't going to nicely solve the
>> problem. On the contrary, I foresee a spectrum of confusion, frustration,
>> and burden for the user as well as for developers.
>>
>> Thanks,
>> Xuefu
>>
>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com> wrote:
>>
>>>
>>>
>>>   Xuefu Zhang <xz...@cloudera.com>
>>>  May 15, 2015 at 17:31
>>>
>>> Just make sure that I understand the proposal correctly: we are going to
>>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>>
>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2 is
>>> already well established.
>>>
>>>  New features
>>> are only merged to branch-2. That essentially says we stop development for
>>> hadoop-1, right?
>>>
>>>  If developers want to keep contributing patches to branch-1 then
>>> there's no need for it to stop.  We would want to avoid putting new
>>> features only on branch-1, unless they only made sense in that context.
>>> But I assume we'll see people contributing to branch-1 for some time.
>>>
>>>  Are we also making two lines of releases: ene for branch-1
>>> and one for branch-2? Won't that be confusing and also burdensome if we
>>> release say 1.3, 2.0, 2.1, 1.4...
>>>
>>>  I'm asserting that it will be less confusing than the alternatives.
>>> We need some way to make early releases of many of the new features.  I
>>> believe that this proposal is less confusing than if we start putting the
>>> new features in 1.x branches.  This is particularly true because it would
>>> help us to start being able to drop older functionality like Hadoop-1 and
>>> MapReduce, which is very hard to do in the 1.x line without stranding users.
>>>
>>>  Please note that we will have hadoop 3 soon. What's the story there?
>>>
>>>  As I said above, I don't see this as tied to Hadoop versions.
>>>
>>> Alan.
>>>
>>>  Thanks,
>>> Xuefu
>>>
>>>
>>>
>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumashta@hortonworks.com
>>>
>>> wrote:
>>>
>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>> important changes.
>>>
>>>  —Vaibhav
>>>
>>>   From: Alan Gates <al...@gmail.com> <al...@gmail.com>
>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org> <de...@hive.apache.org>
>>> Date: Friday, May 15, 2015 at 4:11 PM
>>> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org> <de...@hive.apache.org>
>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>
>>>  Anyone else have feedback on this?  If not I'll start a vote next week.
>>>
>>> Alan.
>>>
>>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
>>> May 14, 2015 at 10:44
>>>   Hi,
>>>
>>> +1 on the idea.
>>>
>>> Having a stable release branch with ongoing fixes where we do not drop
>>> major features would be good all around.
>>>
>>> It lets us accelerate the pace of development, drop major features or
>>> rewrite them entirely without dragging everyone else kicking & screaming
>>> into that release.
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
>>> May 11, 2015 at 19:17
>>>   That sounds like a good idea.
>>> Some features could be back ported to branch-1 if viable, but at least new
>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>> Probably also a good place to enable vectorization and other perf features
>>> by default while we make alpha releases.
>>>
>>> +1
>>>
>>>
>>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
>>> May 11, 2015 at 15:38
>>>   There is a lot of forward-looking work going on in various branches of
>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>>> be good to have a way to release this code to users so that they can
>>> experiment with it.  Releasing it will also provide feedback to developers.
>>>
>>> At the same time there are discussions on whether to keep supporting
>>> Hadoop-1.  The burden of supporting older, less used functionality such as
>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>
>>> I propose that the best way to deal with this would be to make a
>>> branch-1.  We could continue to make new feature releases off of this
>>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>> This provides stability and continuity for users and developers.
>>>
>>> We could then merge these new features branches (LLAP, HBase metastore,
>>> CLI drop) into the trunk, as well as turn on by default newer features such
>>> as the vectorization and ACID.  We could also drop older, less used
>>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>>> before we are ready to make stable, production ready releases of this
>>> code.  But we could start making alpha quality releases soon.  We would
>>> call these releases 2.x, to stress the non-backward compatible changes such
>>> as dropping Hadoop-1.  This will give users a chance to play with the new
>>> code and developers a chance to get feedback.
>>>
>>> Thoughts?
>>>
>>>
>>>
>>>   Vaibhav Gumashta <vg...@hortonworks.com>
>>>  May 15, 2015 at 16:43
>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>> important changes.
>>>
>>>  —Vaibhav
>>>
>>>   From: Alan Gates <al...@gmail.com>
>>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>>> Date: Friday, May 15, 2015 at 4:11 PM
>>> To: "dev@hive.apache.org" <de...@hive.apache.org>
>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>
>>>  Anyone else have feedback on this?  If not I'll start a vote next week.
>>>
>>> Alan.
>>>
>>>     Gopal Vijayaraghavan <go...@apache.org>
>>>  May 14, 2015 at 10:44
>>> Hi,
>>>
>>> +1 on the idea.
>>>
>>> Having a stable release branch with ongoing fixes where we do not drop
>>> major features would be good all around.
>>>
>>> It lets us accelerate the pace of development, drop major features or
>>> rewrite them entirely without dragging everyone else kicking & screaming
>>> into that release.
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>>   Sergey Shelukhin <se...@hortonworks.com>
>>>  May 11, 2015 at 19:17
>>> That sounds like a good idea.
>>> Some features could be back ported to branch-1 if viable, but at least
>>> new
>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>> Probably also a good place to enable vectorization and other perf
>>> features
>>> by default while we make alpha releases.
>>>
>>> +1
>>>
>>>
>>>   Alan Gates <al...@gmail.com>
>>>  May 11, 2015 at 15:38
>>> There is a lot of forward-looking work going on in various branches of
>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>>> be good to have a way to release this code to users so that they can
>>> experiment with it.  Releasing it will also provide feedback to developers.
>>>
>>> At the same time there are discussions on whether to keep supporting
>>> Hadoop-1.  The burden of supporting older, less used functionality such as
>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>
>>> I propose that the best way to deal with this would be to make a
>>> branch-1.  We could continue to make new feature releases off of this
>>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>> This provides stability and continuity for users and developers.
>>>
>>> We could then merge these new features branches (LLAP, HBase metastore,
>>> CLI drop) into the trunk, as well as turn on by default newer features such
>>> as the vectorization and ACID.  We could also drop older, less used
>>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>>> before we are ready to make stable, production ready releases of this
>>> code.  But we could start making alpha quality releases soon.  We would
>>> call these releases 2.x, to stress the non-backward compatible changes such
>>> as dropping Hadoop-1.  This will give users a chance to play with the new
>>> code and developers a chance to get feedback.
>>>
>>> Thoughts?
>>>
>>>
>>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Edward Capriolo <ed...@gmail.com>.

Up until recently Hive supported numerous versions of Hadoop code base with
a simple shim layer. I would rather we stick to the shim layer. I think
this was easily the best part about hive was that a single release worked
well regardless of your hadoop version. It was also a key element to hive's
success. I do not want to see us have multiple branches.

On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xz...@cloudera.com> wrote:

> Thanks for the explanation, Alan!
>
> While I have understood more on the proposal, I actually see more problems
> than the confusion of two lines of releases. Essentially, this proposal
> forces a user to make a hard choice between a stabler, legacy-aware release
> line and an adventurous, pioneering release line. And once the choice is
> made, there is no easy way back or forward.
>
> Here is my interpretation. Let's say we have two main branches as
> proposed. I develop a new feature which I think useful for both branches.
> So, I commit it to both branches. My feature requires additional schema
> support, so I provide upgrade scripts for both branches. The scripts are
> different because the two branches have already diverged in schema.
>
> Now the two branches evolve in a diverging fashion like this. This is all
> good as long as a user stays in his line. The moment the user considers a
> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because
> there is no upgrade path from a release in branch-1 to a release in
> branch-2!
>
> If we want to provide an upgrade path, then there will be MxN paths, where
> M and N are the number of releases in the two branches, respectively. This
> is going to be next to a nightmare, not only for users, but also for us.
>
> Also, the proposal will require two sets of things that Hive provides:
> double documentation, double feature tracking, double build/test
> infrastructures, etc.
>
> This approach can also potentially cause the problem we saw in hadoop
> releases, where 0.23 release was greater than 1.0 release.
>
> To me, the problem we are trying to solve is deprecating old things such
> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
> however, we approached the problem in less favorable ways.
>
> First, it seemed we wanted to deprecate something just for the sake of
> deprecation, and it's not based on the rationale that supports the desire.
> Dev might write code that accidentally break hadoop-1 build. However, this
> is more a build infrastructure problem rather than the burden of supporting
> hadoop-1. If our build could catch it at precommit test, then I would think
> the accident can be well avoided. Most of the times, fixing the build is
> trivial. And we have already addressed the build infrastructure problem.
>
> Secondly, if we do have a strong reason to deprecate something, we should
> have a deprecation plan rather than declaring on the spot that the current
> release is the last one supporting X. I think Microsoft did a better job in
> terms production deprecation. For instance, they announced long before the
> last day desupporting Windows XP. In my opinion, we should have a similar
> vision, giving users, distributions enough time to adjust rather than
> shocking them with breaking news.
>
> In summary, I do see the need of deprecation in Hive, but I am afraid the
> way we take, including the proposal here, isn't going to nicely solve the
> problem. On the contrary, I foresee a spectrum of confusion, frustration,
> and burden for the user as well as for developers.
>
> Thanks,
> Xuefu
>
> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com> wrote:
>
>>
>>
>>   Xuefu Zhang <xz...@cloudera.com>
>>  May 15, 2015 at 17:31
>>
>> Just make sure that I understand the proposal correctly: we are going to
>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>
>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>> It will be some time before Hive's branch-2 is stable, while Hadoop-2 is
>> already well established.
>>
>>  New features
>> are only merged to branch-2. That essentially says we stop development for
>> hadoop-1, right?
>>
>>  If developers want to keep contributing patches to branch-1 then
>> there's no need for it to stop.  We would want to avoid putting new
>> features only on branch-1, unless they only made sense in that context.
>> But I assume we'll see people contributing to branch-1 for some time.
>>
>>  Are we also making two lines of releases: ene for branch-1
>> and one for branch-2? Won't that be confusing and also burdensome if we
>> release say 1.3, 2.0, 2.1, 1.4...
>>
>>  I'm asserting that it will be less confusing than the alternatives.  We
>> need some way to make early releases of many of the new features.  I
>> believe that this proposal is less confusing than if we start putting the
>> new features in 1.x branches.  This is particularly true because it would
>> help us to start being able to drop older functionality like Hadoop-1 and
>> MapReduce, which is very hard to do in the 1.x line without stranding users.
>>
>>  Please note that we will have hadoop 3 soon. What's the story there?
>>
>>  As I said above, I don't see this as tied to Hadoop versions.
>>
>> Alan.
>>
>>  Thanks,
>> Xuefu
>>
>>
>>
>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumashta@hortonworks.com
>>
>> wrote:
>>
>>  +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>  —Vaibhav
>>
>>   From: Alan Gates <al...@gmail.com> <al...@gmail.com>
>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org> <de...@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org> <de...@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>  Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
>> May 14, 2015 at 10:44
>>   Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking & screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
>> May 11, 2015 at 19:17
>>   That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>    Alan Gates <al...@gmail.com> <al...@gmail.com>
>> May 11, 2015 at 15:38
>>   There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>>
>>   Vaibhav Gumashta <vg...@hortonworks.com>
>>  May 15, 2015 at 16:43
>>  +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>  —Vaibhav
>>
>>   From: Alan Gates <al...@gmail.com>
>> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org" <de...@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>  Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>     Gopal Vijayaraghavan <go...@apache.org>
>>  May 14, 2015 at 10:44
>> Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking & screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>   Sergey Shelukhin <se...@hortonworks.com>
>>  May 11, 2015 at 19:17
>> That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>   Alan Gates <al...@gmail.com>
>>  May 11, 2015 at 15:38
>> There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.


> Xuefu Zhang <ma...@cloudera.com>
> May 15, 2015 at 22:29
> Thanks for the explanation, Alan!
>
> While I have understood more on the proposal, I actually see more 
> problems than the confusion of two lines of releases. Essentially, 
> this proposal forces a user to make a hard choice between a stabler, 
> legacy-aware release line and an adventurous, pioneering release line. 
> And once the choice is made, there is no easy way back or forward.
>
> Here is my interpretation. Let's say we have two main branches as 
> proposed. I develop a new feature which I think useful for both 
> branches. So, I commit it to both branches. My feature requires 
> additional schema support, so I provide upgrade scripts for both 
> branches. The scripts are different because the two branches have 
> already diverged in schema.
>
> Now the two branches evolve in a diverging fashion like this. This is 
> all good as long as a user stays in his line. The moment the user 
> considers a switch, mostly likely, from branch-1 to branch-2, he is 
> stuck. Why? Because there is no upgrade path from a release in 
> branch-1 to a release in branch-2!
>
> If we want to provide an upgrade path, then there will be MxN paths, 
> where M and N are the number of releases in the two branches, 
> respectively. This is going to be next to a nightmare, not only for 
> users, but also for us.
MxN would indeed be bad, but there is no reason to approach it that 
way.  It's highly unlikely that users will want to migrate from 2.x -> 
1.y.  And for a given 1.x release, we can assume that users will want to 
be able to migrate to the current head of branch for 2.y.  So this means 
we would need two upgrade scripts from each 1.x release.  This is extra 
effort but it is not that bad.
>
> Also, the proposal will require two sets of things that Hive provides: 
> double documentation, double feature tracking, double build/test 
> infrastructures, etc.
Our documentation already handles the fact that certain features are 
only supported in certain releases.  Our test and build infrastructure 
can already be made to work on multiple branches.  I'm not sure what you 
mean by double feature tracking.
>
> This approach can also potentially cause the problem we saw in hadoop 
> releases, where 0.23 release was greater than 1.0 release.
I'm sorry, I don't follow what you're saying here.  You mean the numbers 
are just bigger (like 23 > 1)?  We already have that problem, this 
doesn't make it worse.
>
> To me, the problem we are trying to solve is deprecating old things 
> such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I 
> see, however, we approached the problem in less favorable ways.
That is only one of the two problems.  The other is to provide a 
mechanism for experimental features.
>
> First, it seemed we wanted to deprecate something just for the sake of 
> deprecation, and it's not based on the rationale that supports the 
> desire. Dev might write code that accidentally break hadoop-1 build. 
> However, this is more a build infrastructure problem rather than the 
> burden of supporting hadoop-1. If our build could catch it at 
> precommit test, then I would think the accident can be well avoided. 
> Most of the times, fixing the build is trivial. And we have already 
> addressed the build infrastructure problem.
>
> Secondly, if we do have a strong reason to deprecate something, we 
> should have a deprecation plan rather than declaring on the spot that 
> the current release is the last one supporting X. I think Microsoft 
> did a better job in terms production deprecation. For instance, they 
> announced long before the last day desupporting Windows XP. In my 
> opinion, we should have a similar vision, giving users, distributions 
> enough time to adjust rather than shocking them with breaking news.
>
> In summary, I do see the need of deprecation in Hive, but I am afraid 
> the way we take, including the proposal here, isn't going to nicely 
> solve the problem. On the contrary, I foresee a spectrum of confusion, 
> frustration, and burden for the user as well as for developers.
>
> Thanks,
> Xuefu
>
>
> Xuefu Zhang <ma...@cloudera.com>
> May 15, 2015 at 17:31
> Just make sure that I understand the proposal correctly: we are going to
> have two main branches, one for hadoop-1 and one for hadoop-2. New features
> are only merged to branch-2. That essentially says we stop development for
> hadoop-1, right? Are we also making two lines of releases: ene for branch-1
> and one for branch-2? Won't that be confusing and also burdensome if we
> release say 1.3, 2.0, 2.1, 1.4...
>
> Please note that we will have hadoop 3 soon. What's the story there?
>
> Thanks,
> Xuefu
>
>
>
> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta<vgumashta@hortonworks.com
>> wrote:
>
>>   +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>   —Vaibhav
>>
>>    From: Alan Gates<al...@gmail.com>
>> Reply-To: "dev@hive.apache.org"<de...@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org"<de...@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>   Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>     Gopal Vijayaraghavan<go...@apache.org>
>> May 14, 2015 at 10:44
>>    Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking&  screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>     Sergey Shelukhin<se...@hortonworks.com>
>> May 11, 2015 at 19:17
>>    That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>     Alan Gates<al...@gmail.com>
>> May 11, 2015 at 15:38
>>    There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>
> Vaibhav Gumashta <ma...@hortonworks.com>
> May 15, 2015 at 16:43
> +1 on the new branch. I think it’ll help in faster dev time for these 
> important changes.
>
> —Vaibhav
>
> From: Alan Gates <alanfgates@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@hive.apache.org <ma...@hive.apache.org>" 
> <dev@hive.apache.org <ma...@hive.apache.org>>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org <ma...@hive.apache.org>" 
> <dev@hive.apache.org <ma...@hive.apache.org>>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
> Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
> Alan Gates <ma...@gmail.com>
> May 15, 2015 at 16:11
> Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
> Gopal Vijayaraghavan <ma...@apache.org>
> May 14, 2015 at 10:44
> Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Xuefu Zhang <xz...@cloudera.com>.

Thanks for the explanation, Alan!

While I have understood more on the proposal, I actually see more problems
than the confusion of two lines of releases. Essentially, this proposal
forces a user to make a hard choice between a stabler, legacy-aware release
line and an adventurous, pioneering release line. And once the choice is
made, there is no easy way back or forward.

Here is my interpretation. Let's say we have two main branches as proposed.
I develop a new feature which I think useful for both branches. So, I
commit it to both branches. My feature requires additional schema support,
so I provide upgrade scripts for both branches. The scripts are different
because the two branches have already diverged in schema.

Now the two branches evolve in a diverging fashion like this. This is all
good as long as a user stays in his line. The moment the user considers a
switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because
there is no upgrade path from a release in branch-1 to a release in
branch-2!

If we want to provide an upgrade path, then there will be MxN paths, where
M and N are the number of releases in the two branches, respectively. This
is going to be next to a nightmare, not only for users, but also for us.

Also, the proposal will require two sets of things that Hive provides:
double documentation, double feature tracking, double build/test
infrastructures, etc.

This approach can also potentially cause the problem we saw in hadoop
releases, where 0.23 release was greater than 1.0 release.

To me, the problem we are trying to solve is deprecating old things such
hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
however, we approached the problem in less favorable ways.

First, it seemed we wanted to deprecate something just for the sake of
deprecation, and it's not based on the rationale that supports the desire.
Dev might write code that accidentally break hadoop-1 build. However, this
is more a build infrastructure problem rather than the burden of supporting
hadoop-1. If our build could catch it at precommit test, then I would think
the accident can be well avoided. Most of the times, fixing the build is
trivial. And we have already addressed the build infrastructure problem.

Secondly, if we do have a strong reason to deprecate something, we should
have a deprecation plan rather than declaring on the spot that the current
release is the last one supporting X. I think Microsoft did a better job in
terms production deprecation. For instance, they announced long before the
last day desupporting Windows XP. In my opinion, we should have a similar
vision, giving users, distributions enough time to adjust rather than
shocking them with breaking news.

In summary, I do see the need of deprecation in Hive, but I am afraid the
way we take, including the proposal here, isn't going to nicely solve the
problem. On the contrary, I foresee a spectrum of confusion, frustration,
and burden for the user as well as for developers.

Thanks,
Xuefu

On Fri, May 15, 2015 at 8:19 PM, Alan Gates <al...@gmail.com> wrote:

>
>
>   Xuefu Zhang <xz...@cloudera.com>
>  May 15, 2015 at 17:31
>
> Just make sure that I understand the proposal correctly: we are going to
> have two main branches, one for hadoop-1 and one for hadoop-2.
>
>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.  It
> will be some time before Hive's branch-2 is stable, while Hadoop-2 is
> already well established.
>
>  New features
> are only merged to branch-2. That essentially says we stop development for
> hadoop-1, right?
>
>  If developers want to keep contributing patches to branch-1 then there's
> no need for it to stop.  We would want to avoid putting new features only
> on branch-1, unless they only made sense in that context.  But I assume
> we'll see people contributing to branch-1 for some time.
>
>  Are we also making two lines of releases: ene for branch-1
> and one for branch-2? Won't that be confusing and also burdensome if we
> release say 1.3, 2.0, 2.1, 1.4...
>
>  I'm asserting that it will be less confusing than the alternatives.  We
> need some way to make early releases of many of the new features.  I
> believe that this proposal is less confusing than if we start putting the
> new features in 1.x branches.  This is particularly true because it would
> help us to start being able to drop older functionality like Hadoop-1 and
> MapReduce, which is very hard to do in the 1.x line without stranding users.
>
>  Please note that we will have hadoop 3 soon. What's the story there?
>
>  As I said above, I don't see this as tied to Hadoop versions.
>
> Alan.
>
>
> Thanks,
> Xuefu
>
>
>
> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumashta@hortonworks.com
>
> wrote:
>
>  +1 on the new branch. I think it’ll help in faster dev time for these
> important changes.
>
>  —Vaibhav
>
>   From: Alan Gates <al...@gmail.com> <al...@gmail.com>
> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org> <de...@hive.apache.org>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org" <de...@hive.apache.org> <de...@hive.apache.org> <de...@hive.apache.org>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
>  Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
>    Gopal Vijayaraghavan <go...@apache.org> <go...@apache.org>
> May 14, 2015 at 10:44
>   Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
>    Sergey Shelukhin <se...@hortonworks.com> <se...@hortonworks.com>
> May 11, 2015 at 19:17
>   That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
>    Alan Gates <al...@gmail.com> <al...@gmail.com>
> May 11, 2015 at 15:38
>   There is a lot of forward-looking work going on in various branches of
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
> be good to have a way to release this code to users so that they can
> experiment with it.  Releasing it will also provide feedback to developers.
>
> At the same time there are discussions on whether to keep supporting
> Hadoop-1.  The burden of supporting older, less used functionality such as
> Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a
> branch-1.  We could continue to make new feature releases off of this
> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
> This provides stability and continuity for users and developers.
>
> We could then merge these new features branches (LLAP, HBase metastore,
> CLI drop) into the trunk, as well as turn on by default newer features such
> as the vectorization and ACID.  We could also drop older, less used
> features such as support for Hadoop-1 and MapReduce.  It will be a while
> before we are ready to make stable, production ready releases of this
> code.  But we could start making alpha quality releases soon.  We would
> call these releases 2.x, to stress the non-backward compatible changes such
> as dropping Hadoop-1.  This will give users a chance to play with the new
> code and developers a chance to get feedback.
>
> Thoughts?
>
>
>
>   Vaibhav Gumashta <vg...@hortonworks.com>
>  May 15, 2015 at 16:43
>  +1 on the new branch. I think it’ll help in faster dev time for these
> important changes.
>
>  —Vaibhav
>
>   From: Alan Gates <al...@gmail.com>
> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org" <de...@hive.apache.org>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
>  Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
>     Gopal Vijayaraghavan <go...@apache.org>
>  May 14, 2015 at 10:44
> Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
>   Sergey Shelukhin <se...@hortonworks.com>
>  May 11, 2015 at 19:17
> That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
>   Alan Gates <al...@gmail.com>
>  May 11, 2015 at 15:38
> There is a lot of forward-looking work going on in various branches of
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
> be good to have a way to release this code to users so that they can
> experiment with it.  Releasing it will also provide feedback to developers.
>
> At the same time there are discussions on whether to keep supporting
> Hadoop-1.  The burden of supporting older, less used functionality such as
> Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a
> branch-1.  We could continue to make new feature releases off of this
> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
> This provides stability and continuity for users and developers.
>
> We could then merge these new features branches (LLAP, HBase metastore,
> CLI drop) into the trunk, as well as turn on by default newer features such
> as the vectorization and ACID.  We could also drop older, less used
> features such as support for Hadoop-1 and MapReduce.  It will be a while
> before we are ready to make stable, production ready releases of this
> code.  But we could start making alpha quality releases soon.  We would
> call these releases 2.x, to stress the non-backward compatible changes such
> as dropping Hadoop-1.  This will give users a chance to play with the new
> code and developers a chance to get feedback.
>
> Thoughts?
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.


> Xuefu Zhang <ma...@cloudera.com>
> May 15, 2015 at 17:31
> Just make sure that I understand the proposal correctly: we are going to
> have two main branches, one for hadoop-1 and one for hadoop-2.
We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.  
It will be some time before Hive's branch-2 is stable, while Hadoop-2 is 
already well established.
> New features
> are only merged to branch-2. That essentially says we stop development for
> hadoop-1, right?
If developers want to keep contributing patches to branch-1 then there's 
no need for it to stop.  We would want to avoid putting new features 
only on branch-1, unless they only made sense in that context.  But I 
assume we'll see people contributing to branch-1 for some time.
> Are we also making two lines of releases: ene for branch-1
> and one for branch-2? Won't that be confusing and also burdensome if we
> release say 1.3, 2.0, 2.1, 1.4...
I'm asserting that it will be less confusing than the alternatives.  We 
need some way to make early releases of many of the new features.  I 
believe that this proposal is less confusing than if we start putting 
the new features in 1.x branches.  This is particularly true because it 
would help us to start being able to drop older functionality like 
Hadoop-1 and MapReduce, which is very hard to do in the 1.x line without 
stranding users.
> Please note that we will have hadoop 3 soon. What's the story there?
As I said above, I don't see this as tied to Hadoop versions.

Alan.
>
> Thanks,
> Xuefu
>
>
>
> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta<vgumashta@hortonworks.com
>> wrote:
>
>>   +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>   —Vaibhav
>>
>>    From: Alan Gates<al...@gmail.com>
>> Reply-To: "dev@hive.apache.org"<de...@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org"<de...@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>   Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>     Gopal Vijayaraghavan<go...@apache.org>
>> May 14, 2015 at 10:44
>>    Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking&  screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>     Sergey Shelukhin<se...@hortonworks.com>
>> May 11, 2015 at 19:17
>>    That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>     Alan Gates<al...@gmail.com>
>> May 11, 2015 at 15:38
>>    There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>
> Vaibhav Gumashta <ma...@hortonworks.com>
> May 15, 2015 at 16:43
> +1 on the new branch. I think it’ll help in faster dev time for these 
> important changes.
>
> —Vaibhav
>
> From: Alan Gates <alanfgates@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@hive.apache.org <ma...@hive.apache.org>" 
> <dev@hive.apache.org <ma...@hive.apache.org>>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org <ma...@hive.apache.org>" 
> <dev@hive.apache.org <ma...@hive.apache.org>>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
> Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
> Gopal Vijayaraghavan <ma...@apache.org>
> May 14, 2015 at 10:44
> Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 11, 2015 at 19:17
> That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
> Alan Gates <ma...@gmail.com>
> May 11, 2015 at 15:38
> There is a lot of forward-looking work going on in various branches of 
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It 
> would be good to have a way to release this code to users so that they 
> can experiment with it.  Releasing it will also provide feedback to 
> developers.
>
> At the same time there are discussions on whether to keep supporting 
> Hadoop-1.  The burden of supporting older, less used functionality 
> such as Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a 
> branch-1.  We could continue to make new feature releases off of this 
> branch (1.3, 1.4, etc.).  This branch would not drop old 
> functionality.  This provides stability and continuity for users and 
> developers.
>
> We could then merge these new features branches (LLAP, HBase 
> metastore, CLI drop) into the trunk, as well as turn on by default 
> newer features such as the vectorization and ACID.  We could also drop 
> older, less used features such as support for Hadoop-1 and MapReduce.  
> It will be a while before we are ready to make stable, production 
> ready releases of this code.  But we could start making alpha quality 
> releases soon.  We would call these releases 2.x, to stress the 
> non-backward compatible changes such as dropping Hadoop-1.  This will 
> give users a chance to play with the new code and developers a chance 
> to get feedback.
>
> Thoughts?

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Xuefu Zhang <xz...@cloudera.com>.

Just make sure that I understand the proposal correctly: we are going to
have two main branches, one for hadoop-1 and one for hadoop-2. New features
are only merged to branch-2. That essentially says we stop development for
hadoop-1, right? Are we also making two lines of releases: ene for branch-1
and one for branch-2? Won't that be confusing and also burdensome if we
release say 1.3, 2.0, 2.1, 1.4...

Please note that we will have hadoop 3 soon. What's the story there?

Thanks,
Xuefu



On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumashta@hortonworks.com
> wrote:

>  +1 on the new branch. I think it’ll help in faster dev time for these
> important changes.
>
>  —Vaibhav
>
>   From: Alan Gates <al...@gmail.com>
> Reply-To: "dev@hive.apache.org" <de...@hive.apache.org>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org" <de...@hive.apache.org>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
>  Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
>    Gopal Vijayaraghavan <go...@apache.org>
> May 14, 2015 at 10:44
>   Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
>    Sergey Shelukhin <se...@hortonworks.com>
> May 11, 2015 at 19:17
>   That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
>    Alan Gates <al...@gmail.com>
> May 11, 2015 at 15:38
>   There is a lot of forward-looking work going on in various branches of
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
> be good to have a way to release this code to users so that they can
> experiment with it.  Releasing it will also provide feedback to developers.
>
> At the same time there are discussions on whether to keep supporting
> Hadoop-1.  The burden of supporting older, less used functionality such as
> Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a
> branch-1.  We could continue to make new feature releases off of this
> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
> This provides stability and continuity for users and developers.
>
> We could then merge these new features branches (LLAP, HBase metastore,
> CLI drop) into the trunk, as well as turn on by default newer features such
> as the vectorization and ACID.  We could also drop older, less used
> features such as support for Hadoop-1 and MapReduce.  It will be a while
> before we are ready to make stable, production ready releases of this
> code.  But we could start making alpha quality releases soon.  We would
> call these releases 2.x, to stress the non-backward compatible changes such
> as dropping Hadoop-1.  This will give users a chance to play with the new
> code and developers a chance to get feedback.
>
> Thoughts?
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.

+1 on the new branch. I think it'll help in faster dev time for these important changes.

-Vaibhav

From: Alan Gates <al...@gmail.com>>
Reply-To: "dev@hive.apache.org<ma...@hive.apache.org>" <de...@hive.apache.org>>
Date: Friday, May 15, 2015 at 4:11 PM
To: "dev@hive.apache.org<ma...@hive.apache.org>" <de...@hive.apache.org>>
Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Anyone else have feedback on this?  If not I'll start a vote next week.

Alan.

[cid:part1.02010204.04070406@gmail.com]
Gopal Vijayaraghavan<ma...@apache.org>
May 14, 2015 at 10:44
Hi,

+1 on the idea.

Having a stable release branch with ongoing fixes where we do not drop
major features would be good all around.

It lets us accelerate the pace of development, drop major features or
rewrite them entirely without dragging everyone else kicking & screaming
into that release.

Cheers,
Gopal

[cid:part1.02010204.04070406@gmail.com]
Sergey Shelukhin<ma...@hortonworks.com>
May 11, 2015 at 19:17
That sounds like a good idea.
Some features could be back ported to branch-1 if viable, but at least new
stuff would not be burdened by Hadoop 1/MR code paths.
Probably also a good place to enable vectorization and other perf features
by default while we make alpha releases.

+1

[cid:part1.02010204.04070406@gmail.com]
Alan Gates<ma...@gmail.com>
May 11, 2015 at 15:38
There is a lot of forward-looking work going on in various branches of Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would be good to have a way to release this code to users so that they can experiment with it.  Releasing it will also provide feedback to developers.

At the same time there are discussions on whether to keep supporting Hadoop-1.  The burden of supporting older, less used functionality such as Hadoop-1 is becoming ever harder as many new features are added.

I propose that the best way to deal with this would be to make a branch-1.  We could continue to make new feature releases off of this branch (1.3, 1.4, etc.).  This branch would not drop old functionality.  This provides stability and continuity for users and developers.

We could then merge these new features branches (LLAP, HBase metastore, CLI drop) into the trunk, as well as turn on by default newer features such as the vectorization and ACID.  We could also drop older, less used features such as support for Hadoop-1 and MapReduce.  It will be a while before we are ready to make stable, production ready releases of this code.  But we could start making alpha quality releases soon.  We would call these releases 2.x, to stress the non-backward compatible changes such as dropping Hadoop-1.  This will give users a chance to play with the new code and developers a chance to get feedback.

Thoughts?

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Alan Gates <al...@gmail.com>.

Anyone else have feedback on this?  If not I'll start a vote next week.

Alan.

> Gopal Vijayaraghavan <ma...@apache.org>
> May 14, 2015 at 10:44
> Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
> Sergey Shelukhin <ma...@hortonworks.com>
> May 11, 2015 at 19:17
> That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
> Alan Gates <ma...@gmail.com>
> May 11, 2015 at 15:38
> There is a lot of forward-looking work going on in various branches of 
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It 
> would be good to have a way to release this code to users so that they 
> can experiment with it.  Releasing it will also provide feedback to 
> developers.
>
> At the same time there are discussions on whether to keep supporting 
> Hadoop-1.  The burden of supporting older, less used functionality 
> such as Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a 
> branch-1.  We could continue to make new feature releases off of this 
> branch (1.3, 1.4, etc.).  This branch would not drop old 
> functionality.  This provides stability and continuity for users and 
> developers.
>
> We could then merge these new features branches (LLAP, HBase 
> metastore, CLI drop) into the trunk, as well as turn on by default 
> newer features such as the vectorization and ACID.  We could also drop 
> older, less used features such as support for Hadoop-1 and MapReduce.  
> It will be a while before we are ready to make stable, production 
> ready releases of this code.  But we could start making alpha quality 
> releases soon.  We would call these releases 2.x, to stress the 
> non-backward compatible changes such as dropping Hadoop-1.  This will 
> give users a chance to play with the new code and developers a chance 
> to get feedback.
>
> Thoughts?

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Gopal Vijayaraghavan <go...@apache.org>.

Hi,

+1 on the idea.

Having a stable release branch with ongoing fixes where we do not drop
major features would be good all around.

It lets us accelerate the pace of development, drop major features or
rewrite them entirely without dragging everyone else kicking & screaming
into that release.

Cheers,
Gopal

On 5/11/15, 7:17 PM, "Sergey Shelukhin" <se...@hortonworks.com> wrote:

>That sounds like a good idea.
>Some features could be back ported to branch-1 if viable, but at least new
>stuff would not be burdened by Hadoop 1/MR code paths.
>Probably also a good place to enable vectorization and other perf features
>by default while we make alpha releases.
>
>+1
>
>On 15/5/11, 15:38, "Alan Gates" <al...@gmail.com> wrote:
>
>>There is a lot of forward-looking work going on in various branches of
>>Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>would be good to have a way to release this code to users so that they
>>can experiment with it.  Releasing it will also provide feedback to
>>developers.
>>
>>At the same time there are discussions on whether to keep supporting
>>Hadoop-1.  The burden of supporting older, less used functionality such
>>as Hadoop-1 is becoming ever harder as many new features are added.
>>
>>I propose that the best way to deal with this would be to make a
>>branch-1.  We could continue to make new feature releases off of this
>>branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>This provides stability and continuity for users and developers.
>>
>>We could then merge these new features branches (LLAP, HBase metastore,
>>CLI drop) into the trunk, as well as turn on by default newer features
>>such as the vectorization and ACID.  We could also drop older, less used
>>features such as support for Hadoop-1 and MapReduce.  It will be a while
>>before we are ready to make stable, production ready releases of this
>>code.  But we could start making alpha quality releases soon.  We would
>>call these releases 2.x, to stress the non-backward compatible changes
>>such as dropping Hadoop-1.  This will give users a chance to play with
>>the new code and developers a chance to get feedback.
>>
>>Thoughts?
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Thejas Nair <th...@gmail.com>.

+1
This is great for development of new features in hive and making them
available to useres. This also helps users who are slow to move to new
version of hadoop, they can still bug fixes and features compatible
with hadoop 1 in new hive 1.x releases.

It will also be easier for users to remember what hadoop version works
with what version of hive. (Hive 1.x needs hadoop 1+, hive 2.x needs).


On Mon, May 11, 2015 at 10:01 PM, Prasanth Jayachandran
<pj...@hortonworks.com> wrote:
> +1 for the proposal. New branch definitely helps us moving forward quickly with new features and deprecating the old stuffs (20S shims and mapreduce).
>
> Thanks
> Prasanth
>
>
>
>
> On Mon, May 11, 2015 at 7:20 PM -0700, "Vikram Dixit K" <vi...@gmail.com>> wrote:
>
> The proposal sounds good. Supporting and maintaining
> hadoop-1 is hard and conflict in API changes in 2.x of hadoop keeps us
> from using new and better APIs as it breaks compilation.
>
> +1
>
> Thanks
> Vikram.
>
> On Mon, May 11, 2015 at 7:17 PM, Sergey Shelukhin
> <se...@hortonworks.com> wrote:
>> That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>> On 15/5/11, 15:38, "Alan Gates" <al...@gmail.com> wrote:
>>
>>>There is a lot of forward-looking work going on in various branches of
>>>Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>would be good to have a way to release this code to users so that they
>>>can experiment with it.  Releasing it will also provide feedback to
>>>developers.
>>>
>>>At the same time there are discussions on whether to keep supporting
>>>Hadoop-1.  The burden of supporting older, less used functionality such
>>>as Hadoop-1 is becoming ever harder as many new features are added.
>>>
>>>I propose that the best way to deal with this would be to make a
>>>branch-1.  We could continue to make new feature releases off of this
>>>branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>>This provides stability and continuity for users and developers.
>>>
>>>We could then merge these new features branches (LLAP, HBase metastore,
>>>CLI drop) into the trunk, as well as turn on by default newer features
>>>such as the vectorization and ACID.  We could also drop older, less used
>>>features such as support for Hadoop-1 and MapReduce.  It will be a while
>>>before we are ready to make stable, production ready releases of this
>>>code.  But we could start making alpha quality releases soon.  We would
>>>call these releases 2.x, to stress the non-backward compatible changes
>>>such as dropping Hadoop-1.  This will give users a chance to play with
>>>the new code and developers a chance to get feedback.
>>>
>>>Thoughts?
>>
>
>
>
> --
> Nothing better than when appreciated for hard work.
> -Mark

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.

+1 for the proposal. New branch definitely helps us moving forward quickly with new features and deprecating the old stuffs (20S shims and mapreduce).

Thanks
Prasanth




On Mon, May 11, 2015 at 7:20 PM -0700, "Vikram Dixit K" <vi...@gmail.com>> wrote:

The proposal sounds good. Supporting and maintaining
hadoop-1 is hard and conflict in API changes in 2.x of hadoop keeps us
from using new and better APIs as it breaks compilation.

+1

Thanks
Vikram.

On Mon, May 11, 2015 at 7:17 PM, Sergey Shelukhin
<se...@hortonworks.com> wrote:
> That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
> On 15/5/11, 15:38, "Alan Gates" <al...@gmail.com> wrote:
>
>>There is a lot of forward-looking work going on in various branches of
>>Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>would be good to have a way to release this code to users so that they
>>can experiment with it.  Releasing it will also provide feedback to
>>developers.
>>
>>At the same time there are discussions on whether to keep supporting
>>Hadoop-1.  The burden of supporting older, less used functionality such
>>as Hadoop-1 is becoming ever harder as many new features are added.
>>
>>I propose that the best way to deal with this would be to make a
>>branch-1.  We could continue to make new feature releases off of this
>>branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>This provides stability and continuity for users and developers.
>>
>>We could then merge these new features branches (LLAP, HBase metastore,
>>CLI drop) into the trunk, as well as turn on by default newer features
>>such as the vectorization and ACID.  We could also drop older, less used
>>features such as support for Hadoop-1 and MapReduce.  It will be a while
>>before we are ready to make stable, production ready releases of this
>>code.  But we could start making alpha quality releases soon.  We would
>>call these releases 2.x, to stress the non-backward compatible changes
>>such as dropping Hadoop-1.  This will give users a chance to play with
>>the new code and developers a chance to get feedback.
>>
>>Thoughts?
>



--
Nothing better than when appreciated for hard work.
-Mark

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Vikram Dixit K <vi...@gmail.com>.

The proposal sounds good. Supporting and maintaining
hadoop-1 is hard and conflict in API changes in 2.x of hadoop keeps us
from using new and better APIs as it breaks compilation.

+1

Thanks
Vikram.

On Mon, May 11, 2015 at 7:17 PM, Sergey Shelukhin
<se...@hortonworks.com> wrote:
> That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
> On 15/5/11, 15:38, "Alan Gates" <al...@gmail.com> wrote:
>
>>There is a lot of forward-looking work going on in various branches of
>>Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>would be good to have a way to release this code to users so that they
>>can experiment with it.  Releasing it will also provide feedback to
>>developers.
>>
>>At the same time there are discussions on whether to keep supporting
>>Hadoop-1.  The burden of supporting older, less used functionality such
>>as Hadoop-1 is becoming ever harder as many new features are added.
>>
>>I propose that the best way to deal with this would be to make a
>>branch-1.  We could continue to make new feature releases off of this
>>branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>>This provides stability and continuity for users and developers.
>>
>>We could then merge these new features branches (LLAP, HBase metastore,
>>CLI drop) into the trunk, as well as turn on by default newer features
>>such as the vectorization and ACID.  We could also drop older, less used
>>features such as support for Hadoop-1 and MapReduce.  It will be a while
>>before we are ready to make stable, production ready releases of this
>>code.  But we could start making alpha quality releases soon.  We would
>>call these releases 2.x, to stress the non-backward compatible changes
>>such as dropping Hadoop-1.  This will give users a chance to play with
>>the new code and developers a chance to get feedback.
>>
>>Thoughts?
>



-- 
Nothing better than when appreciated for hard work.
-Mark

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Posted by Sergey Shelukhin <se...@hortonworks.com>.

That sounds like a good idea.
Some features could be back ported to branch-1 if viable, but at least new
stuff would not be burdened by Hadoop 1/MR code paths.
Probably also a good place to enable vectorization and other perf features
by default while we make alpha releases.

+1

On 15/5/11, 15:38, "Alan Gates" <al...@gmail.com> wrote:

>There is a lot of forward-looking work going on in various branches of
>Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>would be good to have a way to release this code to users so that they
>can experiment with it.  Releasing it will also provide feedback to
>developers.
>
>At the same time there are discussions on whether to keep supporting
>Hadoop-1.  The burden of supporting older, less used functionality such
>as Hadoop-1 is becoming ever harder as many new features are added.
>
>I propose that the best way to deal with this would be to make a
>branch-1.  We could continue to make new feature releases off of this
>branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>This provides stability and continuity for users and developers.
>
>We could then merge these new features branches (LLAP, HBase metastore,
>CLI drop) into the trunk, as well as turn on by default newer features
>such as the vectorization and ACID.  We could also drop older, less used
>features such as support for Hadoop-1 and MapReduce.  It will be a while
>before we are ready to make stable, production ready releases of this
>code.  But we could start making alpha quality releases soon.  We would
>call these releases 2.x, to stress the non-backward compatible changes
>such as dropping Hadoop-1.  This will give users a chance to play with
>the new code and developers a chance to get feedback.
>
>Thoughts?