You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by Roman Shaposhnik <ro...@shaposhnik.org> on 2016/12/30 05:46:52 UTC

Major version upgrades policy

Hi!

as BIGTOP-2282 indicated it seems that we have a bit
of a difference in opinion on how major version bumps
in the stack need to be handled. Spark 1 vs 2 and Hive
1 vs 2 are a good examples.

Since JIRA is not always the best medium for a discussion
I wanted to get this back to the mailing list.

My biggest question is actually around the goals/assumptions
that I wanted to validate with y'all.

So, am I right in assuming that:
   #1 our implicit bias is to NOT have multiple version of
      the same component in a stack?

   #2 we try to figure out what version is THE version based
       on how ready the component is to be integrated with the
       rest of the stack

   #3 if somebody wants to do the work to support an extra
     version -- that's fine, but that version gets the digit
     as in spark1 and also that person gets to do all the work

Thanks,
Roman.

Re: Major version upgrades policy

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.

On Wed, Jan 4, 2017 at 2:02 PM, Olaf Flebbe <of...@oflebbe.de> wrote:
> Hey,
>
>> This is fine, but even if you intro a new component's version with a
>> version
>> appended to its name, you're still on the hook to make a decision of what
>> to use in some dependencie's do-component-build script. And that, what
>> I think needs to be, a default <component name> (no version as part of
>> the name).
>>
>
> Romans Word
>
>    #1 our implicit bias is to NOT have multiple version of
>       the same component in a stack?
>
> seems central to me: A focus point of our project is integration of big data
> components, favourably Apache components.
>
> With a major upgrade we did something which is supposed to fail: We force
> component from 1.x to 2.y. If the dev is using semantic versioning this
> surely will fail. That happened with Spark, since we (That's me, for
> instance) didn't realize that Spark is used deeply integrated by that many
> other frameworks.
>
> Argueing about COMPONENT_VERSION vs COMPONENT2_VERSION does not help here:
> Big Data Community is split about how to handle Spark:
>
> Apache Spark has created an own ecosystem around it. As far I can see,
> anyone interested in Spark will use Spark2, anyone running traditional
> workloads will run Spark1. We cannot fix this right now, so it is that way.
>
> My proposed workaround is having Spark1 hanging unsupported around. This can
> work out for both.

Exactly!

Thanks,
Roman.

Re: Major version upgrades policy

Posted by Olaf Flebbe <of...@oflebbe.de>.

Hey,

> This is fine, but even if you intro a new component's version with a version
> appended to its name, you're still on the hook to make a decision of what
> to use in some dependencie's do-component-build script. And that, what
> I think needs to be, a default <component name> (no version as part of
> the name).
>

Romans Word

    #1 our implicit bias is to NOT have multiple version of
       the same component in a stack?

seems central to me: A focus point of our project is integration of big 
data components, favourably Apache components.

With a major upgrade we did something which is supposed to fail: We 
force component from 1.x to 2.y. If the dev is using semantic versioning 
this surely will fail. That happened with Spark, since we (That's me, 
for instance) didn't realize that Spark is used deeply integrated by 
that many other frameworks.

Argueing about COMPONENT_VERSION vs COMPONENT2_VERSION does not help 
here: Big Data Community is split about how to handle Spark:

Apache Spark has created an own ecosystem around it. As far I can see, 
anyone interested in Spark will use Spark2, anyone running traditional 
workloads will run Spark1. We cannot fix this right now, so it is that way.

My proposed workaround is having Spark1 hanging unsupported around. This 
can work out for both.

Olaf

Re: Major version upgrades policy

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.

On Mon, Jan 2, 2017 at 5:53 PM, Konstantin Boudnik <co...@apache.org> wrote:
> I'd say there's nothing wrong with having newer versions of a component, even
> as a mere "preview" for the users who are wiling to take the risk of running
> something not yet stable/well polished.

Sure, but for some components (like HDFS, etc.) this means essentially
having different stacks. For some components like Pig it is easier of course
but the question still remains -- what's the default version in a
Bigtop's release.

> My concern is that the latest approach (which goes opposite to what has been
> done for Sqoop in the similar situation and went unnoticed because the original
> version hasn't been touched) introduces a lot of hassle and breaks the build
> for an uncertain period of time. And it puts the pressure on the people
> involved and might lead to haphazard patches.

I think this goes back to my point of what's the majority opinion on what the
default is. For Spark it is 2, for Sqoop it was 1.

All I'm saying is that the happy path (with component not having a version
as part of its name) could be different in different cases, but it is up to us
to agree on what that happy path is.

Then, after we agree, all sort of other versions could be added to the stack
with <component><version> naming scheme.

What this signals to our user community is that as Bigtop maintainers we're
totally willing to support <component>, but things like  <component><version>
are "use at your own risk".

> As I have said before, let's introduce new versions under new name (like
> component2, component3, etc.) and keep the original 'component' intact until
> some later time. That seems to be a safe way of adding multiple versions and
> yet not to disrupt the stability of the stack. Clearly, even better approach
> is to do such work on the branch but it might put too much load on our CI,
> etc.

This is fine, but even if you intro a new component's version with a version
appended to its name, you're still on the hook to make a decision of what
to use in some dependencie's do-component-build script. And that, what
I think needs to be, a default <component name> (no version as part of
the name).

Thanks,
Roman.

Re: Major version upgrades policy

Posted by Konstantin Boudnik <co...@apache.org>.

I'd say there's nothing wrong with having newer versions of a component, even
as a mere "preview" for the users who are wiling to take the risk of running
something not yet stable/well polished.

My concern is that the latest approach (which goes opposite to what has been
done for Sqoop in the similar situation and went unnoticed because the original
version hasn't been touched) introduces a lot of hassle and breaks the build
for an uncertain period of time. And it puts the pressure on the people
involved and might lead to haphazard patches. 

As I have said before, let's introduce new versions under new name (like
component2, component3, etc.) and keep the original 'component' intact until
some later time. That seems to be a safe way of adding multiple versions and
yet not to disrupt the stability of the stack. Clearly, even better approach
is to do such work on the branch but it might put too much load on our CI,
etc.

Thoughts?
  Cos

On Fri, Dec 30, 2016 at 01:10PM, Olaf Flebbe wrote:
> Hi Roman,
> 
> I am not absolutely convinced of #1, #2 and #3 to be the right way:
> 
> There must be a way to try out new versions and see the full mess without
> ploughing to all the big data universe.
> 
> Right now I am seeing the mess.
> 
> I was seriously running out of time: Having an unsupported spark1 version
> hanging around for some emergency situations seems a lot more worthwile than
> not to have spark2 at all. I seriously doubt anyone will support spark1 any
> more.
> 
> If the majority likes to stay at the old versions, please revert.
> 
> Olaf
> 
> 
> 
> 
> 
> 
> > Am 30.12.2016 um 06:46 schrieb Roman Shaposhnik <ro...@shaposhnik.org>:
> > 
> > Hi!
> > 
> > as BIGTOP-2282 indicated it seems that we have a bit
> > of a difference in opinion on how major version bumps
> > in the stack need to be handled. Spark 1 vs 2 and Hive
> > 1 vs 2 are a good examples.
> > 
> > Since JIRA is not always the best medium for a discussion
> > I wanted to get this back to the mailing list.
> > 
> > My biggest question is actually around the goals/assumptions
> > that I wanted to validate with y'all.
> > 
> > So, am I right in assuming that:
> >   #1 our implicit bias is to NOT have multiple version of
> >      the same component in a stack?
> > 
> >   #2 we try to figure out what version is THE version based
> >       on how ready the component is to be integrated with the
> >       rest of the stack
> > 
> >   #3 if somebody wants to do the work to support an extra
> >     version -- that's fine, but that version gets the digit
> >     as in spark1 and also that person gets to do all the work
> > 
> > Thanks,
> > Roman.
>

Re: Major version upgrades policy

Posted by Olaf Flebbe <of...@oflebbe.de>.

Hi Roman,

I am not absolutely convinced of #1, #2 and #3 to be the right way:

There must be a way to try out new versions and see the full mess without
ploughing to all the big data universe.

Right now I am seeing the mess.

I was seriously running out of time: Having an unsupported spark1 version hanging around for some emergency situations seems a lot more worthwile than not to have spark2 at all. I seriously doubt anyone will support spark1 any more.

If the majority likes to stay at the old versions, please revert.

Olaf






> Am 30.12.2016 um 06:46 schrieb Roman Shaposhnik <ro...@shaposhnik.org>:
> 
> Hi!
> 
> as BIGTOP-2282 indicated it seems that we have a bit
> of a difference in opinion on how major version bumps
> in the stack need to be handled. Spark 1 vs 2 and Hive
> 1 vs 2 are a good examples.
> 
> Since JIRA is not always the best medium for a discussion
> I wanted to get this back to the mailing list.
> 
> My biggest question is actually around the goals/assumptions
> that I wanted to validate with y'all.
> 
> So, am I right in assuming that:
>   #1 our implicit bias is to NOT have multiple version of
>      the same component in a stack?
> 
>   #2 we try to figure out what version is THE version based
>       on how ready the component is to be integrated with the
>       rest of the stack
> 
>   #3 if somebody wants to do the work to support an extra
>     version -- that's fine, but that version gets the digit
>     as in spark1 and also that person gets to do all the work
> 
> Thanks,
> Roman.