You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Patrick Wendell <pw...@gmail.com> on 2014/02/06 01:20:10 UTC

Proposal for Spark Release Strategy

Hi Everyone,

In an effort to coordinate development amongst the growing list of
Spark contributors, I've taken some time to write up a proposal to
formalize various pieces of the development process. The next release
of Spark will likely be Spark 1.0.0, so this message is intended in
part to coordinate the release plan for 1.0.0 and future releases.
I'll post this on the wiki after discussing it on this thread as
tentative project guidelines.

== Spark Release Structure ==
Starting with Spark 1.0.0, the Spark project will follow the semantic
versioning guidelines (http://semver.org/) with a few deviations.
These small differences account for Spark's nature as a multi-module
project.

Each Spark release will be versioned:
[MAJOR].[MINOR].[MAINTENANCE]

All releases with the same major version number will have API
compatibility, defined as [1]. Major version numbers will remain
stable over long periods of time. For instance, 1.X.Y may last 1 year
or more.

Minor releases will typically contain new features and improvements.
The target frequency for minor releases is every 3-4 months. One
change we'd like to make is to announce fixed release dates and merge
windows for each release, to facilitate coordination. Each minor
release will have a merge window where new patches can be merged, a QA
window when only fixes can be merged, then a final period where voting
occurs on release candidates. These windows will be announced
immediately after the previous minor release to give people plenty of
time, and over time, we might make the whole release process more
regular (similar to Ubuntu). At the bottom of this document is an
example window for the 1.0.0 release.

Maintenance releases will occur more frequently and depend on specific
patches introduced (e.g. bug fixes) and their urgency. In general
these releases are designed to patch bugs. However, higher level
libraries may introduce small features, such as a new algorithm,
provided they are entirely additive and isolated from existing code
paths. Spark core may not introduce any features.

When new components are added to Spark, they may initially be marked
as "alpha". Alpha components do not have to abide by the above
guidelines, however, to the maximum extent possible, they should try
to. Once they are marked "stable" they have to follow these
guidelines. At present, GraphX is the only alpha component of Spark.

[1] API compatibility:

An API is any public class or interface exposed in Spark that is not
marked as semi-private or experimental. Release A is API compatible
with release B if code compiled against release A *compiles cleanly*
against B. This does not guarantee that a compiled application that is
linked against version A will link cleanly against version B without
re-compiling. Link-level compatibility is something we'll try to
guarantee that as well, and we might make it a requirement in the
future, but challenges with things like Scala versions have made this
difficult to guarantee in the past.

== Merging Pull Requests ==
To merge pull requests, committers are encouraged to use this tool [2]
to collapse the request into one commit rather than manually
performing git merges. It will also format the commit message nicely
in a way that can be easily parsed later when writing credits.
Currently it is maintained in a public utility repository, but we'll
merge it into mainline Spark soon.

[2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py

== Tentative Release Window for 1.0.0 ==
Feb 1st - April 1st: General development
April 1st: Code freeze for new features
April 15th: RC1

== Deviations ==
For now, the proposal is to consider these tentative guidelines. We
can vote to formalize these as project rules at a later time after
some experience working with them. Once formalized, any deviation to
these guidelines will be subject to a lazy majority vote.

- Patrick

Re: Proposal for Spark Release Strategy

Posted by Patrick Wendell <pw...@gmail.com>.
Will,

Thanks for these thoughts - this is something we should try to be
attentive to in the way we think about versioning.

(2)-(5) are pretty consistent with the guidelines we already follow. I
think the biggest proposed difference is to be conscious of (1), which
at least I had not given much thought to in the past. Specifically, if
we make major version upgrades of dependencies within a major release
of Spark, it can cause issues for downstream packagers. I can't easily
recall how often we do this or whether this will be hard for us to
guarantee (maybe others can...). It's something to keep in mind though
- thanks for bringing it up.

- Patrick

On Fri, Feb 7, 2014 at 10:28 AM, Will Benton <wi...@redhat.com> wrote:
> Semantic versioning is great, and I think the proposed extensions for adopting it in Spark make a lot of sense.  However, by focusing strictly on public APIs, semantic versioning only solves part of the problem (albeit certainly the most interesting part).  I'd like to raise another issue that the semantic versioning guidelines explicitly exclude: the relative stability of dependencies and dependency versions.  This is less of a concern for end-users than it is for downstream packagers, but I believe that the relative stability of a dependency stack *should* be part of what is implied by a major version number.
>
> Here are some suggestions for how to incorporate dependency stack versioning into semantic versioning in order to make life easier for downstreams; please consider all of these to be prefaced with "If at all possible,":
>
> 1.  Switching a dependency to an incompatible version should be reserved for major releases.  In general, downstream operating system distributions support only one version of each library, although in rare cases alternate versions are available for backwards compatibility.  If a bug fix or feature addition in a patch or minor release depends on adopting a version of some library that is incompatible with the one used by the prior patch or minor release, then downstreams may not be able to incorporate the fix or functionality until every package impacted by the dependency can be updated to work with the new version.
>
> 2.  New dependencies should only be introduced with new features (and thus with new minor versions).  This suggestion is probably uncontroversial, since features are more likely than bugfixes to require additional external libraries.
>
> 3.  The scope of new dependencies should be proportional to the benefit that they provide.  Of course, we want to avoid reinventing the wheel, but if the alternative is pulling in a framework for WheelFactory generation, a WheelContainer library, and a dozen transitive dependencies, maybe it's worth considering reinventing at least the simplest and least general wheels.
>
> 4.  If new functionality requires additional dependencies, it should be developed to work with the most recent stable version of those libraries that is generally available.  Again, since downstreams typically support only one version per library at a time, this will make their job easier.  (This will benefit everyone, though, since the most recent version of some dependency is more likely to see active maintenance efforts.)
>
> 5.  Dependencies can be removed at any time.
>
> I hope these can be a starting point for further discussion and adoption of practices that demarcate the scope of dependency changes in a given version stream.
>
>
>
> best,
> wb
>
>
> ----- Original Message -----
>> From: "Patrick Wendell" <pw...@gmail.com>
>> To: dev@spark.incubator.apache.org
>> Sent: Wednesday, February 5, 2014 6:20:10 PM
>> Subject: Proposal for Spark Release Strategy
>>
>> Hi Everyone,
>>
>> In an effort to coordinate development amongst the growing list of
>> Spark contributors, I've taken some time to write up a proposal to
>> formalize various pieces of the development process. The next release
>> of Spark will likely be Spark 1.0.0, so this message is intended in
>> part to coordinate the release plan for 1.0.0 and future releases.
>> I'll post this on the wiki after discussing it on this thread as
>> tentative project guidelines.
>>
>> == Spark Release Structure ==
>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>> versioning guidelines (http://semver.org/) with a few deviations.
>> These small differences account for Spark's nature as a multi-module
>> project.
>>
>> Each Spark release will be versioned:
>> [MAJOR].[MINOR].[MAINTENANCE]
>>
>> All releases with the same major version number will have API
>> compatibility, defined as [1]. Major version numbers will remain
>> stable over long periods of time. For instance, 1.X.Y may last 1 year
>> or more.
>>
>> Minor releases will typically contain new features and improvements.
>> The target frequency for minor releases is every 3-4 months. One
>> change we'd like to make is to announce fixed release dates and merge
>> windows for each release, to facilitate coordination. Each minor
>> release will have a merge window where new patches can be merged, a QA
>> window when only fixes can be merged, then a final period where voting
>> occurs on release candidates. These windows will be announced
>> immediately after the previous minor release to give people plenty of
>> time, and over time, we might make the whole release process more
>> regular (similar to Ubuntu). At the bottom of this document is an
>> example window for the 1.0.0 release.
>>
>> Maintenance releases will occur more frequently and depend on specific
>> patches introduced (e.g. bug fixes) and their urgency. In general
>> these releases are designed to patch bugs. However, higher level
>> libraries may introduce small features, such as a new algorithm,
>> provided they are entirely additive and isolated from existing code
>> paths. Spark core may not introduce any features.
>>
>> When new components are added to Spark, they may initially be marked
>> as "alpha". Alpha components do not have to abide by the above
>> guidelines, however, to the maximum extent possible, they should try
>> to. Once they are marked "stable" they have to follow these
>> guidelines. At present, GraphX is the only alpha component of Spark.
>>
>> [1] API compatibility:
>>
>> An API is any public class or interface exposed in Spark that is not
>> marked as semi-private or experimental. Release A is API compatible
>> with release B if code compiled against release A *compiles cleanly*
>> against B. This does not guarantee that a compiled application that is
>> linked against version A will link cleanly against version B without
>> re-compiling. Link-level compatibility is something we'll try to
>> guarantee that as well, and we might make it a requirement in the
>> future, but challenges with things like Scala versions have made this
>> difficult to guarantee in the past.
>>
>> == Merging Pull Requests ==
>> To merge pull requests, committers are encouraged to use this tool [2]
>> to collapse the request into one commit rather than manually
>> performing git merges. It will also format the commit message nicely
>> in a way that can be easily parsed later when writing credits.
>> Currently it is maintained in a public utility repository, but we'll
>> merge it into mainline Spark soon.
>>
>> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>
>> == Tentative Release Window for 1.0.0 ==
>> Feb 1st - April 1st: General development
>> April 1st: Code freeze for new features
>> April 15th: RC1
>>
>> == Deviations ==
>> For now, the proposal is to consider these tentative guidelines. We
>> can vote to formalize these as project rules at a later time after
>> some experience working with them. Once formalized, any deviation to
>> these guidelines will be subject to a lazy majority vote.
>>
>> - Patrick
>>

Re: Proposal for Spark Release Strategy

Posted by Will Benton <wi...@redhat.com>.
Semantic versioning is great, and I think the proposed extensions for adopting it in Spark make a lot of sense.  However, by focusing strictly on public APIs, semantic versioning only solves part of the problem (albeit certainly the most interesting part).  I'd like to raise another issue that the semantic versioning guidelines explicitly exclude: the relative stability of dependencies and dependency versions.  This is less of a concern for end-users than it is for downstream packagers, but I believe that the relative stability of a dependency stack *should* be part of what is implied by a major version number.

Here are some suggestions for how to incorporate dependency stack versioning into semantic versioning in order to make life easier for downstreams; please consider all of these to be prefaced with "If at all possible,":

1.  Switching a dependency to an incompatible version should be reserved for major releases.  In general, downstream operating system distributions support only one version of each library, although in rare cases alternate versions are available for backwards compatibility.  If a bug fix or feature addition in a patch or minor release depends on adopting a version of some library that is incompatible with the one used by the prior patch or minor release, then downstreams may not be able to incorporate the fix or functionality until every package impacted by the dependency can be updated to work with the new version.

2.  New dependencies should only be introduced with new features (and thus with new minor versions).  This suggestion is probably uncontroversial, since features are more likely than bugfixes to require additional external libraries.

3.  The scope of new dependencies should be proportional to the benefit that they provide.  Of course, we want to avoid reinventing the wheel, but if the alternative is pulling in a framework for WheelFactory generation, a WheelContainer library, and a dozen transitive dependencies, maybe it's worth considering reinventing at least the simplest and least general wheels.

4.  If new functionality requires additional dependencies, it should be developed to work with the most recent stable version of those libraries that is generally available.  Again, since downstreams typically support only one version per library at a time, this will make their job easier.  (This will benefit everyone, though, since the most recent version of some dependency is more likely to see active maintenance efforts.)

5.  Dependencies can be removed at any time.

I hope these can be a starting point for further discussion and adoption of practices that demarcate the scope of dependency changes in a given version stream.



best,
wb


----- Original Message -----
> From: "Patrick Wendell" <pw...@gmail.com>
> To: dev@spark.incubator.apache.org
> Sent: Wednesday, February 5, 2014 6:20:10 PM
> Subject: Proposal for Spark Release Strategy
> 
> Hi Everyone,
> 
> In an effort to coordinate development amongst the growing list of
> Spark contributors, I've taken some time to write up a proposal to
> formalize various pieces of the development process. The next release
> of Spark will likely be Spark 1.0.0, so this message is intended in
> part to coordinate the release plan for 1.0.0 and future releases.
> I'll post this on the wiki after discussing it on this thread as
> tentative project guidelines.
> 
> == Spark Release Structure ==
> Starting with Spark 1.0.0, the Spark project will follow the semantic
> versioning guidelines (http://semver.org/) with a few deviations.
> These small differences account for Spark's nature as a multi-module
> project.
> 
> Each Spark release will be versioned:
> [MAJOR].[MINOR].[MAINTENANCE]
> 
> All releases with the same major version number will have API
> compatibility, defined as [1]. Major version numbers will remain
> stable over long periods of time. For instance, 1.X.Y may last 1 year
> or more.
> 
> Minor releases will typically contain new features and improvements.
> The target frequency for minor releases is every 3-4 months. One
> change we'd like to make is to announce fixed release dates and merge
> windows for each release, to facilitate coordination. Each minor
> release will have a merge window where new patches can be merged, a QA
> window when only fixes can be merged, then a final period where voting
> occurs on release candidates. These windows will be announced
> immediately after the previous minor release to give people plenty of
> time, and over time, we might make the whole release process more
> regular (similar to Ubuntu). At the bottom of this document is an
> example window for the 1.0.0 release.
> 
> Maintenance releases will occur more frequently and depend on specific
> patches introduced (e.g. bug fixes) and their urgency. In general
> these releases are designed to patch bugs. However, higher level
> libraries may introduce small features, such as a new algorithm,
> provided they are entirely additive and isolated from existing code
> paths. Spark core may not introduce any features.
> 
> When new components are added to Spark, they may initially be marked
> as "alpha". Alpha components do not have to abide by the above
> guidelines, however, to the maximum extent possible, they should try
> to. Once they are marked "stable" they have to follow these
> guidelines. At present, GraphX is the only alpha component of Spark.
> 
> [1] API compatibility:
> 
> An API is any public class or interface exposed in Spark that is not
> marked as semi-private or experimental. Release A is API compatible
> with release B if code compiled against release A *compiles cleanly*
> against B. This does not guarantee that a compiled application that is
> linked against version A will link cleanly against version B without
> re-compiling. Link-level compatibility is something we'll try to
> guarantee that as well, and we might make it a requirement in the
> future, but challenges with things like Scala versions have made this
> difficult to guarantee in the past.
> 
> == Merging Pull Requests ==
> To merge pull requests, committers are encouraged to use this tool [2]
> to collapse the request into one commit rather than manually
> performing git merges. It will also format the commit message nicely
> in a way that can be easily parsed later when writing credits.
> Currently it is maintained in a public utility repository, but we'll
> merge it into mainline Spark soon.
> 
> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> 
> == Tentative Release Window for 1.0.0 ==
> Feb 1st - April 1st: General development
> April 1st: Code freeze for new features
> April 15th: RC1
> 
> == Deviations ==
> For now, the proposal is to consider these tentative guidelines. We
> can vote to formalize these as project rules at a later time after
> some experience working with them. Once formalized, any deviation to
> these guidelines will be subject to a lazy majority vote.
> 
> - Patrick
> 

Re: Proposal for Spark Release Strategy

Posted by Patrick Wendell <pw...@gmail.com>.
> I like Heiko's proposal that requires every pull request to reference a
> JIRA.  This is how things are done in Hadoop and it makes it much easier
> to, for example, find out whether an issue you came across when googling
> for an error is in a release.

I think this is a good idea and something on which there is wide
consensus. I separately was going to suggest this in a later e-mail
(it's not directly tied to versioning). One of many reasons this is
necessary is because it's becoming hard to track which features ended
up in which releases.

> I agree with Mridul about binary compatibility.  It can be a dealbreaker
> for organizations that are considering an upgrade. The two ways I'm aware
> of that cause binary compatibility are scala version upgrades and messing
> around with inheritance.  Are these not avoidable at least for minor
> releases?

This is clearly a goal but I'm hesitant to codify it until we
understand all of the reasons why it might not work. I've heard in
general with Scala there are many non-obvious things that can break
binary compatibility and we need to understand what they are. I'd
propose we add the migration tool [1] here to our build and use it for
a few months and see what happens (hat tip to Michael Armbrust).

It's easy to formalize this as a requirement later, it's impossible to
go the other direction. For Scala major versions it's possible we can
cross-build between 2.10 and 2.11 to retain link-level compatibility.
It's just entirely uncharted territory and AFAIK no one who's
suggesting this is speaking from experience maintaining this guarantee
for a Scala project.

That would be the strongest convincing reason for me - if someone has
actually done this in the past in a Scala project and speaks from
experience. Most of use are speaking from the perspective of Java
projects where we understand well the trade-off's and costs of
maintaining this guarantee.

[1] https://github.com/typesafehub/migration-manager

- Patrick

Re: Proposal for Spark Release Strategy

Posted by Sandy Ryza <sa...@cloudera.com>.
Thanks for all this Patrick.

I like Heiko's proposal that requires every pull request to reference a
JIRA.  This is how things are done in Hadoop and it makes it much easier
to, for example, find out whether an issue you came across when googling
for an error is in a release.

I agree with Mridul about binary compatibility.  It can be a dealbreaker
for organizations that are considering an upgrade. The two ways I'm aware
of that cause binary compatibility are scala version upgrades and messing
around with inheritance.  Are these not avoidable at least for minor
releases?

-Sandy




On Thu, Feb 6, 2014 at 12:49 AM, Mridul Muralidharan <mr...@gmail.com>wrote:

> The reason I explicitly mentioned about binary compatibility was
> because it was sort of hand waved in the proposal as good to have.
> My understanding is that scala does make it painful to ensure binary
> compatibility - but stability of interfaces is vital to ensure
> dependable platforms.
> Recompilation might be a viable option for developers - not for users.
>
> Regards,
> Mridul
>
>
> On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > If people feel that merging the intermediate SNAPSHOT number is
> > significant, let's just defer merging that until this discussion
> > concludes.
> >
> > That said - the decision to settle on 1.0 for the next release is not
> > just because it happens to come after 0.9. It's a conscientious
> > decision based on the development of the project to this point. A
> > major focus of the 0.9 release was tying off loose ends in terms of
> > backwards compatibility (e.g. spark configuration). There was some
> > discussion back then of maybe cutting a 1.0 release but the decision
> > was deferred until after 0.9.
> >
> > @mridul - pleas see the original post for discussion about binary
> compatibility.
> >
> > On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski <an...@gmail.com>
> wrote:
> >> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
> >> discussion.
> >> On Feb 5, 2014 9:53 PM, "Andrew Ash" <an...@andrewash.com> wrote:
> >>
> >>> Agree on timeboxed releases as well.
> >>>
> >>> Is there a vision for where we want to be as a project before
> declaring the
> >>> first 1.0 release?  While we're in the 0.x days per semver we can break
> >>> backcompat at will (though we try to avoid it where possible), and that
> >>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >>> because it seems to follow after 0.9 rather than making an intentional
> >>> decision that we're at the point where we can stand by the current
> APIs and
> >>> binary compatibility for the next year or so of the major release.
> >>>
> >>> Until that decision is made as a group I'd rather we do an immediate
> >>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> >>> but not the other way around.
> >>>
> >>> https://github.com/apache/incubator-spark/pull/542
> >>>
> >>> Cheers!
> >>> Andrew
> >>>
> >>>
> >>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
> >>> >wrote:
> >>>
> >>> > +1 on time boxed releases and compatibility guidelines
> >>> >
> >>> >
> >>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
> >:
> >>> > >
> >>> > > Hi Everyone,
> >>> > >
> >>> > > In an effort to coordinate development amongst the growing list of
> >>> > > Spark contributors, I've taken some time to write up a proposal to
> >>> > > formalize various pieces of the development process. The next
> release
> >>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
> >>> > > part to coordinate the release plan for 1.0.0 and future releases.
> >>> > > I'll post this on the wiki after discussing it on this thread as
> >>> > > tentative project guidelines.
> >>> > >
> >>> > > == Spark Release Structure ==
> >>> > > Starting with Spark 1.0.0, the Spark project will follow the
> semantic
> >>> > > versioning guidelines (http://semver.org/) with a few deviations.
> >>> > > These small differences account for Spark's nature as a
> multi-module
> >>> > > project.
> >>> > >
> >>> > > Each Spark release will be versioned:
> >>> > > [MAJOR].[MINOR].[MAINTENANCE]
> >>> > >
> >>> > > All releases with the same major version number will have API
> >>> > > compatibility, defined as [1]. Major version numbers will remain
> >>> > > stable over long periods of time. For instance, 1.X.Y may last 1
> year
> >>> > > or more.
> >>> > >
> >>> > > Minor releases will typically contain new features and
> improvements.
> >>> > > The target frequency for minor releases is every 3-4 months. One
> >>> > > change we'd like to make is to announce fixed release dates and
> merge
> >>> > > windows for each release, to facilitate coordination. Each minor
> >>> > > release will have a merge window where new patches can be merged,
> a QA
> >>> > > window when only fixes can be merged, then a final period where
> voting
> >>> > > occurs on release candidates. These windows will be announced
> >>> > > immediately after the previous minor release to give people plenty
> of
> >>> > > time, and over time, we might make the whole release process more
> >>> > > regular (similar to Ubuntu). At the bottom of this document is an
> >>> > > example window for the 1.0.0 release.
> >>> > >
> >>> > > Maintenance releases will occur more frequently and depend on
> specific
> >>> > > patches introduced (e.g. bug fixes) and their urgency. In general
> >>> > > these releases are designed to patch bugs. However, higher level
> >>> > > libraries may introduce small features, such as a new algorithm,
> >>> > > provided they are entirely additive and isolated from existing code
> >>> > > paths. Spark core may not introduce any features.
> >>> > >
> >>> > > When new components are added to Spark, they may initially be
> marked
> >>> > > as "alpha". Alpha components do not have to abide by the above
> >>> > > guidelines, however, to the maximum extent possible, they should
> try
> >>> > > to. Once they are marked "stable" they have to follow these
> >>> > > guidelines. At present, GraphX is the only alpha component of
> Spark.
> >>> > >
> >>> > > [1] API compatibility:
> >>> > >
> >>> > > An API is any public class or interface exposed in Spark that is
> not
> >>> > > marked as semi-private or experimental. Release A is API compatible
> >>> > > with release B if code compiled against release A *compiles
> cleanly*
> >>> > > against B. This does not guarantee that a compiled application
> that is
> >>> > > linked against version A will link cleanly against version B
> without
> >>> > > re-compiling. Link-level compatibility is something we'll try to
> >>> > > guarantee that as well, and we might make it a requirement in the
> >>> > > future, but challenges with things like Scala versions have made
> this
> >>> > > difficult to guarantee in the past.
> >>> > >
> >>> > > == Merging Pull Requests ==
> >>> > > To merge pull requests, committers are encouraged to use this tool
> [2]
> >>> > > to collapse the request into one commit rather than manually
> >>> > > performing git merges. It will also format the commit message
> nicely
> >>> > > in a way that can be easily parsed later when writing credits.
> >>> > > Currently it is maintained in a public utility repository, but
> we'll
> >>> > > merge it into mainline Spark soon.
> >>> > >
> >>> > > [2]
> >>> >
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> >>> > >
> >>> > > == Tentative Release Window for 1.0.0 ==
> >>> > > Feb 1st - April 1st: General development
> >>> > > April 1st: Code freeze for new features
> >>> > > April 15th: RC1
> >>> > >
> >>> > > == Deviations ==
> >>> > > For now, the proposal is to consider these tentative guidelines. We
> >>> > > can vote to formalize these as project rules at a later time after
> >>> > > some experience working with them. Once formalized, any deviation
> to
> >>> > > these guidelines will be subject to a lazy majority vote.
> >>> > >
> >>> > > - Patrick
> >>> >
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups
> >>> "Unofficial Apache Spark Dev Mailing List Mirror" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> an
> >>> email to apache-spark-dev-mirror+unsubscribe@googlegroups.com.
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
>

Re: Proposal for Spark Release Strategy

Posted by Mridul Muralidharan <mr...@gmail.com>.
The reason I explicitly mentioned about binary compatibility was
because it was sort of hand waved in the proposal as good to have.
My understanding is that scala does make it painful to ensure binary
compatibility - but stability of interfaces is vital to ensure
dependable platforms.
Recompilation might be a viable option for developers - not for users.

Regards,
Mridul


On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell <pw...@gmail.com> wrote:
> If people feel that merging the intermediate SNAPSHOT number is
> significant, let's just defer merging that until this discussion
> concludes.
>
> That said - the decision to settle on 1.0 for the next release is not
> just because it happens to come after 0.9. It's a conscientious
> decision based on the development of the project to this point. A
> major focus of the 0.9 release was tying off loose ends in terms of
> backwards compatibility (e.g. spark configuration). There was some
> discussion back then of maybe cutting a 1.0 release but the decision
> was deferred until after 0.9.
>
> @mridul - pleas see the original post for discussion about binary compatibility.
>
> On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski <an...@gmail.com> wrote:
>> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
>> discussion.
>> On Feb 5, 2014 9:53 PM, "Andrew Ash" <an...@andrewash.com> wrote:
>>
>>> Agree on timeboxed releases as well.
>>>
>>> Is there a vision for where we want to be as a project before declaring the
>>> first 1.0 release?  While we're in the 0.x days per semver we can break
>>> backcompat at will (though we try to avoid it where possible), and that
>>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>> because it seems to follow after 0.9 rather than making an intentional
>>> decision that we're at the point where we can stand by the current APIs and
>>> binary compatibility for the next year or so of the major release.
>>>
>>> Until that decision is made as a group I'd rather we do an immediate
>>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>>> but not the other way around.
>>>
>>> https://github.com/apache/incubator-spark/pull/542
>>>
>>> Cheers!
>>> Andrew
>>>
>>>
>>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>>> >wrote:
>>>
>>> > +1 on time boxed releases and compatibility guidelines
>>> >
>>> >
>>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>>> > >
>>> > > Hi Everyone,
>>> > >
>>> > > In an effort to coordinate development amongst the growing list of
>>> > > Spark contributors, I've taken some time to write up a proposal to
>>> > > formalize various pieces of the development process. The next release
>>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
>>> > > part to coordinate the release plan for 1.0.0 and future releases.
>>> > > I'll post this on the wiki after discussing it on this thread as
>>> > > tentative project guidelines.
>>> > >
>>> > > == Spark Release Structure ==
>>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
>>> > > versioning guidelines (http://semver.org/) with a few deviations.
>>> > > These small differences account for Spark's nature as a multi-module
>>> > > project.
>>> > >
>>> > > Each Spark release will be versioned:
>>> > > [MAJOR].[MINOR].[MAINTENANCE]
>>> > >
>>> > > All releases with the same major version number will have API
>>> > > compatibility, defined as [1]. Major version numbers will remain
>>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
>>> > > or more.
>>> > >
>>> > > Minor releases will typically contain new features and improvements.
>>> > > The target frequency for minor releases is every 3-4 months. One
>>> > > change we'd like to make is to announce fixed release dates and merge
>>> > > windows for each release, to facilitate coordination. Each minor
>>> > > release will have a merge window where new patches can be merged, a QA
>>> > > window when only fixes can be merged, then a final period where voting
>>> > > occurs on release candidates. These windows will be announced
>>> > > immediately after the previous minor release to give people plenty of
>>> > > time, and over time, we might make the whole release process more
>>> > > regular (similar to Ubuntu). At the bottom of this document is an
>>> > > example window for the 1.0.0 release.
>>> > >
>>> > > Maintenance releases will occur more frequently and depend on specific
>>> > > patches introduced (e.g. bug fixes) and their urgency. In general
>>> > > these releases are designed to patch bugs. However, higher level
>>> > > libraries may introduce small features, such as a new algorithm,
>>> > > provided they are entirely additive and isolated from existing code
>>> > > paths. Spark core may not introduce any features.
>>> > >
>>> > > When new components are added to Spark, they may initially be marked
>>> > > as "alpha". Alpha components do not have to abide by the above
>>> > > guidelines, however, to the maximum extent possible, they should try
>>> > > to. Once they are marked "stable" they have to follow these
>>> > > guidelines. At present, GraphX is the only alpha component of Spark.
>>> > >
>>> > > [1] API compatibility:
>>> > >
>>> > > An API is any public class or interface exposed in Spark that is not
>>> > > marked as semi-private or experimental. Release A is API compatible
>>> > > with release B if code compiled against release A *compiles cleanly*
>>> > > against B. This does not guarantee that a compiled application that is
>>> > > linked against version A will link cleanly against version B without
>>> > > re-compiling. Link-level compatibility is something we'll try to
>>> > > guarantee that as well, and we might make it a requirement in the
>>> > > future, but challenges with things like Scala versions have made this
>>> > > difficult to guarantee in the past.
>>> > >
>>> > > == Merging Pull Requests ==
>>> > > To merge pull requests, committers are encouraged to use this tool [2]
>>> > > to collapse the request into one commit rather than manually
>>> > > performing git merges. It will also format the commit message nicely
>>> > > in a way that can be easily parsed later when writing credits.
>>> > > Currently it is maintained in a public utility repository, but we'll
>>> > > merge it into mainline Spark soon.
>>> > >
>>> > > [2]
>>> > https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>> > >
>>> > > == Tentative Release Window for 1.0.0 ==
>>> > > Feb 1st - April 1st: General development
>>> > > April 1st: Code freeze for new features
>>> > > April 15th: RC1
>>> > >
>>> > > == Deviations ==
>>> > > For now, the proposal is to consider these tentative guidelines. We
>>> > > can vote to formalize these as project rules at a later time after
>>> > > some experience working with them. Once formalized, any deviation to
>>> > > these guidelines will be subject to a lazy majority vote.
>>> > >
>>> > > - Patrick
>>> >
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Unofficial Apache Spark Dev Mailing List Mirror" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to apache-spark-dev-mirror+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>

Re: Proposal for Spark Release Strategy

Posted by Patrick Wendell <pw...@gmail.com>.
If people feel that merging the intermediate SNAPSHOT number is
significant, let's just defer merging that until this discussion
concludes.

That said - the decision to settle on 1.0 for the next release is not
just because it happens to come after 0.9. It's a conscientious
decision based on the development of the project to this point. A
major focus of the 0.9 release was tying off loose ends in terms of
backwards compatibility (e.g. spark configuration). There was some
discussion back then of maybe cutting a 1.0 release but the decision
was deferred until after 0.9.

@mridul - pleas see the original post for discussion about binary compatibility.

On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski <an...@gmail.com> wrote:
> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
> discussion.
> On Feb 5, 2014 9:53 PM, "Andrew Ash" <an...@andrewash.com> wrote:
>
>> Agree on timeboxed releases as well.
>>
>> Is there a vision for where we want to be as a project before declaring the
>> first 1.0 release?  While we're in the 0.x days per semver we can break
>> backcompat at will (though we try to avoid it where possible), and that
>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> because it seems to follow after 0.9 rather than making an intentional
>> decision that we're at the point where we can stand by the current APIs and
>> binary compatibility for the next year or so of the major release.
>>
>> Until that decision is made as a group I'd rather we do an immediate
>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>> but not the other way around.
>>
>> https://github.com/apache/incubator-spark/pull/542
>>
>> Cheers!
>> Andrew
>>
>>
>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>> >wrote:
>>
>> > +1 on time boxed releases and compatibility guidelines
>> >
>> >
>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>> > >
>> > > Hi Everyone,
>> > >
>> > > In an effort to coordinate development amongst the growing list of
>> > > Spark contributors, I've taken some time to write up a proposal to
>> > > formalize various pieces of the development process. The next release
>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
>> > > part to coordinate the release plan for 1.0.0 and future releases.
>> > > I'll post this on the wiki after discussing it on this thread as
>> > > tentative project guidelines.
>> > >
>> > > == Spark Release Structure ==
>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
>> > > versioning guidelines (http://semver.org/) with a few deviations.
>> > > These small differences account for Spark's nature as a multi-module
>> > > project.
>> > >
>> > > Each Spark release will be versioned:
>> > > [MAJOR].[MINOR].[MAINTENANCE]
>> > >
>> > > All releases with the same major version number will have API
>> > > compatibility, defined as [1]. Major version numbers will remain
>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
>> > > or more.
>> > >
>> > > Minor releases will typically contain new features and improvements.
>> > > The target frequency for minor releases is every 3-4 months. One
>> > > change we'd like to make is to announce fixed release dates and merge
>> > > windows for each release, to facilitate coordination. Each minor
>> > > release will have a merge window where new patches can be merged, a QA
>> > > window when only fixes can be merged, then a final period where voting
>> > > occurs on release candidates. These windows will be announced
>> > > immediately after the previous minor release to give people plenty of
>> > > time, and over time, we might make the whole release process more
>> > > regular (similar to Ubuntu). At the bottom of this document is an
>> > > example window for the 1.0.0 release.
>> > >
>> > > Maintenance releases will occur more frequently and depend on specific
>> > > patches introduced (e.g. bug fixes) and their urgency. In general
>> > > these releases are designed to patch bugs. However, higher level
>> > > libraries may introduce small features, such as a new algorithm,
>> > > provided they are entirely additive and isolated from existing code
>> > > paths. Spark core may not introduce any features.
>> > >
>> > > When new components are added to Spark, they may initially be marked
>> > > as "alpha". Alpha components do not have to abide by the above
>> > > guidelines, however, to the maximum extent possible, they should try
>> > > to. Once they are marked "stable" they have to follow these
>> > > guidelines. At present, GraphX is the only alpha component of Spark.
>> > >
>> > > [1] API compatibility:
>> > >
>> > > An API is any public class or interface exposed in Spark that is not
>> > > marked as semi-private or experimental. Release A is API compatible
>> > > with release B if code compiled against release A *compiles cleanly*
>> > > against B. This does not guarantee that a compiled application that is
>> > > linked against version A will link cleanly against version B without
>> > > re-compiling. Link-level compatibility is something we'll try to
>> > > guarantee that as well, and we might make it a requirement in the
>> > > future, but challenges with things like Scala versions have made this
>> > > difficult to guarantee in the past.
>> > >
>> > > == Merging Pull Requests ==
>> > > To merge pull requests, committers are encouraged to use this tool [2]
>> > > to collapse the request into one commit rather than manually
>> > > performing git merges. It will also format the commit message nicely
>> > > in a way that can be easily parsed later when writing credits.
>> > > Currently it is maintained in a public utility repository, but we'll
>> > > merge it into mainline Spark soon.
>> > >
>> > > [2]
>> > https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>> > >
>> > > == Tentative Release Window for 1.0.0 ==
>> > > Feb 1st - April 1st: General development
>> > > April 1st: Code freeze for new features
>> > > April 15th: RC1
>> > >
>> > > == Deviations ==
>> > > For now, the proposal is to consider these tentative guidelines. We
>> > > can vote to formalize these as project rules at a later time after
>> > > some experience working with them. Once formalized, any deviation to
>> > > these guidelines will be subject to a lazy majority vote.
>> > >
>> > > - Patrick
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Unofficial Apache Spark Dev Mailing List Mirror" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to apache-spark-dev-mirror+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>

Re: Proposal for Spark Release Strategy

Posted by Heiko Braun <ik...@googlemail.com>.
If we could minimize the external dependencies, it would certainly be beneficial long term. 


> Am 06.02.2014 um 07:37 schrieb Mridul Muralidharan <mr...@gmail.com>:
> 
> 
> b) minimize external dependencies - some of them would go away/not be
> actively maintained.

Re: Proposal for Spark Release Strategy

Posted by Mridul Muralidharan <mr...@gmail.com>.
Before we move to 1.0, we need to address two things :

a) backward compatibility not just at api level, but also at binary
level (not forcing recompile).

b) minimize external dependencies - some of them would go away/not be
actively maintained.


Regards,
Mridul


On Thu, Feb 6, 2014 at 11:50 AM, Andy Konwinski <an...@gmail.com> wrote:
> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
> discussion.
> On Feb 5, 2014 9:53 PM, "Andrew Ash" <an...@andrewash.com> wrote:
>
>> Agree on timeboxed releases as well.
>>
>> Is there a vision for where we want to be as a project before declaring the
>> first 1.0 release?  While we're in the 0.x days per semver we can break
>> backcompat at will (though we try to avoid it where possible), and that
>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> because it seems to follow after 0.9 rather than making an intentional
>> decision that we're at the point where we can stand by the current APIs and
>> binary compatibility for the next year or so of the major release.
>>
>> Until that decision is made as a group I'd rather we do an immediate
>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>> but not the other way around.
>>
>> https://github.com/apache/incubator-spark/pull/542
>>
>> Cheers!
>> Andrew
>>
>>
>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>> >wrote:
>>
>> > +1 on time boxed releases and compatibility guidelines
>> >
>> >
>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>> > >
>> > > Hi Everyone,
>> > >
>> > > In an effort to coordinate development amongst the growing list of
>> > > Spark contributors, I've taken some time to write up a proposal to
>> > > formalize various pieces of the development process. The next release
>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
>> > > part to coordinate the release plan for 1.0.0 and future releases.
>> > > I'll post this on the wiki after discussing it on this thread as
>> > > tentative project guidelines.
>> > >
>> > > == Spark Release Structure ==
>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
>> > > versioning guidelines (http://semver.org/) with a few deviations.
>> > > These small differences account for Spark's nature as a multi-module
>> > > project.
>> > >
>> > > Each Spark release will be versioned:
>> > > [MAJOR].[MINOR].[MAINTENANCE]
>> > >
>> > > All releases with the same major version number will have API
>> > > compatibility, defined as [1]. Major version numbers will remain
>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
>> > > or more.
>> > >
>> > > Minor releases will typically contain new features and improvements.
>> > > The target frequency for minor releases is every 3-4 months. One
>> > > change we'd like to make is to announce fixed release dates and merge
>> > > windows for each release, to facilitate coordination. Each minor
>> > > release will have a merge window where new patches can be merged, a QA
>> > > window when only fixes can be merged, then a final period where voting
>> > > occurs on release candidates. These windows will be announced
>> > > immediately after the previous minor release to give people plenty of
>> > > time, and over time, we might make the whole release process more
>> > > regular (similar to Ubuntu). At the bottom of this document is an
>> > > example window for the 1.0.0 release.
>> > >
>> > > Maintenance releases will occur more frequently and depend on specific
>> > > patches introduced (e.g. bug fixes) and their urgency. In general
>> > > these releases are designed to patch bugs. However, higher level
>> > > libraries may introduce small features, such as a new algorithm,
>> > > provided they are entirely additive and isolated from existing code
>> > > paths. Spark core may not introduce any features.
>> > >
>> > > When new components are added to Spark, they may initially be marked
>> > > as "alpha". Alpha components do not have to abide by the above
>> > > guidelines, however, to the maximum extent possible, they should try
>> > > to. Once they are marked "stable" they have to follow these
>> > > guidelines. At present, GraphX is the only alpha component of Spark.
>> > >
>> > > [1] API compatibility:
>> > >
>> > > An API is any public class or interface exposed in Spark that is not
>> > > marked as semi-private or experimental. Release A is API compatible
>> > > with release B if code compiled against release A *compiles cleanly*
>> > > against B. This does not guarantee that a compiled application that is
>> > > linked against version A will link cleanly against version B without
>> > > re-compiling. Link-level compatibility is something we'll try to
>> > > guarantee that as well, and we might make it a requirement in the
>> > > future, but challenges with things like Scala versions have made this
>> > > difficult to guarantee in the past.
>> > >
>> > > == Merging Pull Requests ==
>> > > To merge pull requests, committers are encouraged to use this tool [2]
>> > > to collapse the request into one commit rather than manually
>> > > performing git merges. It will also format the commit message nicely
>> > > in a way that can be easily parsed later when writing credits.
>> > > Currently it is maintained in a public utility repository, but we'll
>> > > merge it into mainline Spark soon.
>> > >
>> > > [2]
>> > https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>> > >
>> > > == Tentative Release Window for 1.0.0 ==
>> > > Feb 1st - April 1st: General development
>> > > April 1st: Code freeze for new features
>> > > April 15th: RC1
>> > >
>> > > == Deviations ==
>> > > For now, the proposal is to consider these tentative guidelines. We
>> > > can vote to formalize these as project rules at a later time after
>> > > some experience working with them. Once formalized, any deviation to
>> > > these guidelines will be subject to a lazy majority vote.
>> > >
>> > > - Patrick
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Unofficial Apache Spark Dev Mailing List Mirror" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to apache-spark-dev-mirror+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>

Re: Proposal for Spark Release Strategy

Posted by Andy Konwinski <an...@gmail.com>.
+1 for 0.10.0 now with the option to switch to 1.0.0 after further
discussion.
On Feb 5, 2014 9:53 PM, "Andrew Ash" <an...@andrewash.com> wrote:

> Agree on timeboxed releases as well.
>
> Is there a vision for where we want to be as a project before declaring the
> first 1.0 release?  While we're in the 0.x days per semver we can break
> backcompat at will (though we try to avoid it where possible), and that
> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> because it seems to follow after 0.9 rather than making an intentional
> decision that we're at the point where we can stand by the current APIs and
> binary compatibility for the next year or so of the major release.
>
> Until that decision is made as a group I'd rather we do an immediate
> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> but not the other way around.
>
> https://github.com/apache/incubator-spark/pull/542
>
> Cheers!
> Andrew
>
>
> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
> >wrote:
>
> > +1 on time boxed releases and compatibility guidelines
> >
> >
> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
> > >
> > > Hi Everyone,
> > >
> > > In an effort to coordinate development amongst the growing list of
> > > Spark contributors, I've taken some time to write up a proposal to
> > > formalize various pieces of the development process. The next release
> > > of Spark will likely be Spark 1.0.0, so this message is intended in
> > > part to coordinate the release plan for 1.0.0 and future releases.
> > > I'll post this on the wiki after discussing it on this thread as
> > > tentative project guidelines.
> > >
> > > == Spark Release Structure ==
> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
> > > versioning guidelines (http://semver.org/) with a few deviations.
> > > These small differences account for Spark's nature as a multi-module
> > > project.
> > >
> > > Each Spark release will be versioned:
> > > [MAJOR].[MINOR].[MAINTENANCE]
> > >
> > > All releases with the same major version number will have API
> > > compatibility, defined as [1]. Major version numbers will remain
> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
> > > or more.
> > >
> > > Minor releases will typically contain new features and improvements.
> > > The target frequency for minor releases is every 3-4 months. One
> > > change we'd like to make is to announce fixed release dates and merge
> > > windows for each release, to facilitate coordination. Each minor
> > > release will have a merge window where new patches can be merged, a QA
> > > window when only fixes can be merged, then a final period where voting
> > > occurs on release candidates. These windows will be announced
> > > immediately after the previous minor release to give people plenty of
> > > time, and over time, we might make the whole release process more
> > > regular (similar to Ubuntu). At the bottom of this document is an
> > > example window for the 1.0.0 release.
> > >
> > > Maintenance releases will occur more frequently and depend on specific
> > > patches introduced (e.g. bug fixes) and their urgency. In general
> > > these releases are designed to patch bugs. However, higher level
> > > libraries may introduce small features, such as a new algorithm,
> > > provided they are entirely additive and isolated from existing code
> > > paths. Spark core may not introduce any features.
> > >
> > > When new components are added to Spark, they may initially be marked
> > > as "alpha". Alpha components do not have to abide by the above
> > > guidelines, however, to the maximum extent possible, they should try
> > > to. Once they are marked "stable" they have to follow these
> > > guidelines. At present, GraphX is the only alpha component of Spark.
> > >
> > > [1] API compatibility:
> > >
> > > An API is any public class or interface exposed in Spark that is not
> > > marked as semi-private or experimental. Release A is API compatible
> > > with release B if code compiled against release A *compiles cleanly*
> > > against B. This does not guarantee that a compiled application that is
> > > linked against version A will link cleanly against version B without
> > > re-compiling. Link-level compatibility is something we'll try to
> > > guarantee that as well, and we might make it a requirement in the
> > > future, but challenges with things like Scala versions have made this
> > > difficult to guarantee in the past.
> > >
> > > == Merging Pull Requests ==
> > > To merge pull requests, committers are encouraged to use this tool [2]
> > > to collapse the request into one commit rather than manually
> > > performing git merges. It will also format the commit message nicely
> > > in a way that can be easily parsed later when writing credits.
> > > Currently it is maintained in a public utility repository, but we'll
> > > merge it into mainline Spark soon.
> > >
> > > [2]
> > https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> > >
> > > == Tentative Release Window for 1.0.0 ==
> > > Feb 1st - April 1st: General development
> > > April 1st: Code freeze for new features
> > > April 15th: RC1
> > >
> > > == Deviations ==
> > > For now, the proposal is to consider these tentative guidelines. We
> > > can vote to formalize these as project rules at a later time after
> > > some experience working with them. Once formalized, any deviation to
> > > these guidelines will be subject to a lazy majority vote.
> > >
> > > - Patrick
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Unofficial Apache Spark Dev Mailing List Mirror" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to apache-spark-dev-mirror+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Proposal for Spark Release Strategy

Posted by Patrick Wendell <pw...@gmail.com>.
Just to echo others - The relevant question is whether we want to
advertise stable API's for users that we will support for a long time
horizon. And doing this is critical to being taken seriously as a
mature project.

The question is not whether or not there are things we want to improve
about Spark (further reduce dependencies, runtime stability, etc) - of
course everyone wants to improve those things!

In the next few months ahead of 1.0 the plan would be to invest effort
in finishing off loose ends in the API and of course, no 1.0 release
candidate will pass muster if these aren't addressed. I only see a few
fairly small blockers though wrt API issues:

- We should mark things that may evolve and change as semi-private
developer API's (e.g. the Spark Listener).
- We need to standardize the Java API in a way that supports Java 8 lamdbas.

Other than that - I don't see many blockers in terms of API changes we
might want to make. A lot of those were dealt with in 0.9 specifically
to prepare for this.

The broader question API "clean-up" brings up a debate about the trade
off of compatibility with older pre-1.0 versions of Spark. This is not
the primary issue under discussion and can be debated separably.

The primary issue at hand is whether to have 1.0 in ~3 months vs
pushing it to ~6 months from now or more.

- Patrick

On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> If the APIs are usable, stability and continuity are much more important
> than perfection.  With many already relying on the current APIs, I think
> trying to clean them up will just cause pain for users and integrators.
>  Hadoop made this mistake when they decided the original MapReduce APIs
> were ugly and introduced a new set of APIs to do the same thing.  Even
> though this happened in a pre-1.0 release, three years down the road, both
> the old and new APIs are still supported, causing endless confusion for
> users.  If individual functions or configuration properties have unclear
> names, they can be deprecated and replaced, but redoing the APIs or
> breaking compatibility at this point is simply not worth it.
>
>
> On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid <im...@quantifind.com> wrote:
>
>> I don't really agree with this logic.  I think we haven't broken API so far
>> because we just keep adding stuff on to it, and we haven't bothered to
>> clean the api up, specifically to *avoid* breaking things.  Here's a
>> handful of api breaking things that we might want to consider:
>>
>> * should we look at all the various configuration properties, and maybe
>> some of them should get renamed for consistency / clarity?
>> * do all of the functions on RDD need to be in core?  or do some of them
>> that are simple additions built on top of the primitives really belong in a
>> "utils" package or something?  Eg., maybe we should get rid of all the
>> variants of the mapPartitions / mapWith / etc.  just have map, and
>> mapPartitionsWithIndex  (too many choices in the api can also be confusing
>> to the user)
>> * are the right things getting tracked in SparkListener?  Do we need to add
>> or remove anything?
>>
>> This is probably not the right list of questions, that's just an idea of
>> the kind of thing we should be thinking about.
>>
>> Its also fine with me if 1.0 is next, I just think that we ought to be
>> asking these kinds of questions up and down the entire api before we
>> release 1.0.  And given that we haven't even started that discussion, it
>> seems possible that there could be new features we'd like to release in
>> 0.10 before that discussion is finished.
>>
>>
>>
>> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <matei.zaharia@gmail.com
>> >wrote:
>>
>> > I think it's important to do 1.0 next. The project has been around for 4
>> > years, and I'd be comfortable maintaining the current codebase for a long
>> > time in an API and binary compatible way through 1.x releases. Over the
>> > past 4 years we haven't actually had major changes to the user-facing
>> API --
>> > the only ones were changing the package to org.apache.spark, and
>> upgrading
>> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>> > two things: it tells users that they can be confident that version will
>> be
>> > maintained for a long time, which we absolutely want to do, and it lets
>> > outsiders see that the project is now fairly mature (for many people,
>> > pre-1.0 might still cause them not to try it). I think both are good for
>> > the community.
>> >
>> > Regarding binary compatibility, I agree that it's what we should strive
>> > for, but it just seems premature to codify now. Let's see how it works
>> > between, say, 1.0 and 1.1, and then we can codify it.
>> >
>> > Matei
>> >
>> > On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
>> > wrote:
>> >
>> > > Thanks Patick to initiate the discussion about next road map for Apache
>> > Spark.
>> > >
>> > > I am +1 for 0.10.0 for next version.
>> > >
>> > > It will give us as community some time to digest the process and the
>> > > vision and make adjustment accordingly.
>> > >
>> > > Release a 1.0.0 is a huge milestone and if we do need to break API
>> > > somehow or modify internal behavior dramatically we could take
>> > > advantage to release 1.0.0 as good step to go to.
>> > >
>> > >
>> > > - Henry
>> > >
>> > >
>> > >
>> > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
>> wrote:
>> > >> Agree on timeboxed releases as well.
>> > >>
>> > >> Is there a vision for where we want to be as a project before
>> declaring
>> > the
>> > >> first 1.0 release?  While we're in the 0.x days per semver we can
>> break
>> > >> backcompat at will (though we try to avoid it where possible), and
>> that
>> > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> > >> because it seems to follow after 0.9 rather than making an intentional
>> > >> decision that we're at the point where we can stand by the current
>> APIs
>> > and
>> > >> binary compatibility for the next year or so of the major release.
>> > >>
>> > >> Until that decision is made as a group I'd rather we do an immediate
>> > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>> > later,
>> > >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>> 1.0
>> > >> but not the other way around.
>> > >>
>> > >> https://github.com/apache/incubator-spark/pull/542
>> > >>
>> > >> Cheers!
>> > >> Andrew
>> > >>
>> > >>
>> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>> > >wrote:
>> > >>
>> > >>> +1 on time boxed releases and compatibility guidelines
>> > >>>
>> > >>>
>> > >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
>> >:
>> > >>>>
>> > >>>> Hi Everyone,
>> > >>>>
>> > >>>> In an effort to coordinate development amongst the growing list of
>> > >>>> Spark contributors, I've taken some time to write up a proposal to
>> > >>>> formalize various pieces of the development process. The next
>> release
>> > >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>> > >>>> part to coordinate the release plan for 1.0.0 and future releases.
>> > >>>> I'll post this on the wiki after discussing it on this thread as
>> > >>>> tentative project guidelines.
>> > >>>>
>> > >>>> == Spark Release Structure ==
>> > >>>> Starting with Spark 1.0.0, the Spark project will follow the
>> semantic
>> > >>>> versioning guidelines (http://semver.org/) with a few deviations.
>> > >>>> These small differences account for Spark's nature as a multi-module
>> > >>>> project.
>> > >>>>
>> > >>>> Each Spark release will be versioned:
>> > >>>> [MAJOR].[MINOR].[MAINTENANCE]
>> > >>>>
>> > >>>> All releases with the same major version number will have API
>> > >>>> compatibility, defined as [1]. Major version numbers will remain
>> > >>>> stable over long periods of time. For instance, 1.X.Y may last 1
>> year
>> > >>>> or more.
>> > >>>>
>> > >>>> Minor releases will typically contain new features and improvements.
>> > >>>> The target frequency for minor releases is every 3-4 months. One
>> > >>>> change we'd like to make is to announce fixed release dates and
>> merge
>> > >>>> windows for each release, to facilitate coordination. Each minor
>> > >>>> release will have a merge window where new patches can be merged, a
>> QA
>> > >>>> window when only fixes can be merged, then a final period where
>> voting
>> > >>>> occurs on release candidates. These windows will be announced
>> > >>>> immediately after the previous minor release to give people plenty
>> of
>> > >>>> time, and over time, we might make the whole release process more
>> > >>>> regular (similar to Ubuntu). At the bottom of this document is an
>> > >>>> example window for the 1.0.0 release.
>> > >>>>
>> > >>>> Maintenance releases will occur more frequently and depend on
>> specific
>> > >>>> patches introduced (e.g. bug fixes) and their urgency. In general
>> > >>>> these releases are designed to patch bugs. However, higher level
>> > >>>> libraries may introduce small features, such as a new algorithm,
>> > >>>> provided they are entirely additive and isolated from existing code
>> > >>>> paths. Spark core may not introduce any features.
>> > >>>>
>> > >>>> When new components are added to Spark, they may initially be marked
>> > >>>> as "alpha". Alpha components do not have to abide by the above
>> > >>>> guidelines, however, to the maximum extent possible, they should try
>> > >>>> to. Once they are marked "stable" they have to follow these
>> > >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>> > >>>>
>> > >>>> [1] API compatibility:
>> > >>>>
>> > >>>> An API is any public class or interface exposed in Spark that is not
>> > >>>> marked as semi-private or experimental. Release A is API compatible
>> > >>>> with release B if code compiled against release A *compiles cleanly*
>> > >>>> against B. This does not guarantee that a compiled application that
>> is
>> > >>>> linked against version A will link cleanly against version B without
>> > >>>> re-compiling. Link-level compatibility is something we'll try to
>> > >>>> guarantee that as well, and we might make it a requirement in the
>> > >>>> future, but challenges with things like Scala versions have made
>> this
>> > >>>> difficult to guarantee in the past.
>> > >>>>
>> > >>>> == Merging Pull Requests ==
>> > >>>> To merge pull requests, committers are encouraged to use this tool
>> [2]
>> > >>>> to collapse the request into one commit rather than manually
>> > >>>> performing git merges. It will also format the commit message nicely
>> > >>>> in a way that can be easily parsed later when writing credits.
>> > >>>> Currently it is maintained in a public utility repository, but we'll
>> > >>>> merge it into mainline Spark soon.
>> > >>>>
>> > >>>> [2]
>> > >>>
>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>> > >>>>
>> > >>>> == Tentative Release Window for 1.0.0 ==
>> > >>>> Feb 1st - April 1st: General development
>> > >>>> April 1st: Code freeze for new features
>> > >>>> April 15th: RC1
>> > >>>>
>> > >>>> == Deviations ==
>> > >>>> For now, the proposal is to consider these tentative guidelines. We
>> > >>>> can vote to formalize these as project rules at a later time after
>> > >>>> some experience working with them. Once formalized, any deviation to
>> > >>>> these guidelines will be subject to a lazy majority vote.
>> > >>>>
>> > >>>> - Patrick
>> > >>>
>> >
>> >
>>

Re: Proposal for Spark Release Strategy

Posted by Mark Hamstra <ma...@clearstorydata.com>.
I'm not sure that that is the conclusion that I would draw from the Hadoop
example.  I would certainly agree that maintaining and supporting both an
old and a new API is a cause of endless confusion for users.  If we are
going to change or drop things from the API to reach 1.0, then we shouldn't
be maintaining and support the prior way of doing things beyond a 1.0.0 ->
1.1.0 deprecation cycle.


On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> If the APIs are usable, stability and continuity are much more important
> than perfection.  With many already relying on the current APIs, I think
> trying to clean them up will just cause pain for users and integrators.
>  Hadoop made this mistake when they decided the original MapReduce APIs
> were ugly and introduced a new set of APIs to do the same thing.  Even
> though this happened in a pre-1.0 release, three years down the road, both
> the old and new APIs are still supported, causing endless confusion for
> users.  If individual functions or configuration properties have unclear
> names, they can be deprecated and replaced, but redoing the APIs or
> breaking compatibility at this point is simply not worth it.
>
>
> On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid <im...@quantifind.com>
> wrote:
>
> > I don't really agree with this logic.  I think we haven't broken API so
> far
> > because we just keep adding stuff on to it, and we haven't bothered to
> > clean the api up, specifically to *avoid* breaking things.  Here's a
> > handful of api breaking things that we might want to consider:
> >
> > * should we look at all the various configuration properties, and maybe
> > some of them should get renamed for consistency / clarity?
> > * do all of the functions on RDD need to be in core?  or do some of them
> > that are simple additions built on top of the primitives really belong
> in a
> > "utils" package or something?  Eg., maybe we should get rid of all the
> > variants of the mapPartitions / mapWith / etc.  just have map, and
> > mapPartitionsWithIndex  (too many choices in the api can also be
> confusing
> > to the user)
> > * are the right things getting tracked in SparkListener?  Do we need to
> add
> > or remove anything?
> >
> > This is probably not the right list of questions, that's just an idea of
> > the kind of thing we should be thinking about.
> >
> > Its also fine with me if 1.0 is next, I just think that we ought to be
> > asking these kinds of questions up and down the entire api before we
> > release 1.0.  And given that we haven't even started that discussion, it
> > seems possible that there could be new features we'd like to release in
> > 0.10 before that discussion is finished.
> >
> >
> >
> > On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <matei.zaharia@gmail.com
> > >wrote:
> >
> > > I think it's important to do 1.0 next. The project has been around for
> 4
> > > years, and I'd be comfortable maintaining the current codebase for a
> long
> > > time in an API and binary compatible way through 1.x releases. Over the
> > > past 4 years we haven't actually had major changes to the user-facing
> > API --
> > > the only ones were changing the package to org.apache.spark, and
> > upgrading
> > > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> > > example, or later cross-building it for Scala 2.11. Updating to 1.0
> says
> > > two things: it tells users that they can be confident that version will
> > be
> > > maintained for a long time, which we absolutely want to do, and it lets
> > > outsiders see that the project is now fairly mature (for many people,
> > > pre-1.0 might still cause them not to try it). I think both are good
> for
> > > the community.
> > >
> > > Regarding binary compatibility, I agree that it's what we should strive
> > > for, but it just seems premature to codify now. Let's see how it works
> > > between, say, 1.0 and 1.1, and then we can codify it.
> > >
> > > Matei
> > >
> > > On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Patick to initiate the discussion about next road map for
> Apache
> > > Spark.
> > > >
> > > > I am +1 for 0.10.0 for next version.
> > > >
> > > > It will give us as community some time to digest the process and the
> > > > vision and make adjustment accordingly.
> > > >
> > > > Release a 1.0.0 is a huge milestone and if we do need to break API
> > > > somehow or modify internal behavior dramatically we could take
> > > > advantage to release 1.0.0 as good step to go to.
> > > >
> > > >
> > > > - Henry
> > > >
> > > >
> > > >
> > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
> > wrote:
> > > >> Agree on timeboxed releases as well.
> > > >>
> > > >> Is there a vision for where we want to be as a project before
> > declaring
> > > the
> > > >> first 1.0 release?  While we're in the 0.x days per semver we can
> > break
> > > >> backcompat at will (though we try to avoid it where possible), and
> > that
> > > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> > > >> because it seems to follow after 0.9 rather than making an
> intentional
> > > >> decision that we're at the point where we can stand by the current
> > APIs
> > > and
> > > >> binary compatibility for the next year or so of the major release.
> > > >>
> > > >> Until that decision is made as a group I'd rather we do an immediate
> > > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> > > later,
> > > >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> > 1.0
> > > >> but not the other way around.
> > > >>
> > > >> https://github.com/apache/incubator-spark/pull/542
> > > >>
> > > >> Cheers!
> > > >> Andrew
> > > >>
> > > >>
> > > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <
> ike.braun@googlemail.com
> > > >wrote:
> > > >>
> > > >>> +1 on time boxed releases and compatibility guidelines
> > > >>>
> > > >>>
> > > >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <
> pwendell@gmail.com
> > >:
> > > >>>>
> > > >>>> Hi Everyone,
> > > >>>>
> > > >>>> In an effort to coordinate development amongst the growing list of
> > > >>>> Spark contributors, I've taken some time to write up a proposal to
> > > >>>> formalize various pieces of the development process. The next
> > release
> > > >>>> of Spark will likely be Spark 1.0.0, so this message is intended
> in
> > > >>>> part to coordinate the release plan for 1.0.0 and future releases.
> > > >>>> I'll post this on the wiki after discussing it on this thread as
> > > >>>> tentative project guidelines.
> > > >>>>
> > > >>>> == Spark Release Structure ==
> > > >>>> Starting with Spark 1.0.0, the Spark project will follow the
> > semantic
> > > >>>> versioning guidelines (http://semver.org/) with a few deviations.
> > > >>>> These small differences account for Spark's nature as a
> multi-module
> > > >>>> project.
> > > >>>>
> > > >>>> Each Spark release will be versioned:
> > > >>>> [MAJOR].[MINOR].[MAINTENANCE]
> > > >>>>
> > > >>>> All releases with the same major version number will have API
> > > >>>> compatibility, defined as [1]. Major version numbers will remain
> > > >>>> stable over long periods of time. For instance, 1.X.Y may last 1
> > year
> > > >>>> or more.
> > > >>>>
> > > >>>> Minor releases will typically contain new features and
> improvements.
> > > >>>> The target frequency for minor releases is every 3-4 months. One
> > > >>>> change we'd like to make is to announce fixed release dates and
> > merge
> > > >>>> windows for each release, to facilitate coordination. Each minor
> > > >>>> release will have a merge window where new patches can be merged,
> a
> > QA
> > > >>>> window when only fixes can be merged, then a final period where
> > voting
> > > >>>> occurs on release candidates. These windows will be announced
> > > >>>> immediately after the previous minor release to give people plenty
> > of
> > > >>>> time, and over time, we might make the whole release process more
> > > >>>> regular (similar to Ubuntu). At the bottom of this document is an
> > > >>>> example window for the 1.0.0 release.
> > > >>>>
> > > >>>> Maintenance releases will occur more frequently and depend on
> > specific
> > > >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> > > >>>> these releases are designed to patch bugs. However, higher level
> > > >>>> libraries may introduce small features, such as a new algorithm,
> > > >>>> provided they are entirely additive and isolated from existing
> code
> > > >>>> paths. Spark core may not introduce any features.
> > > >>>>
> > > >>>> When new components are added to Spark, they may initially be
> marked
> > > >>>> as "alpha". Alpha components do not have to abide by the above
> > > >>>> guidelines, however, to the maximum extent possible, they should
> try
> > > >>>> to. Once they are marked "stable" they have to follow these
> > > >>>> guidelines. At present, GraphX is the only alpha component of
> Spark.
> > > >>>>
> > > >>>> [1] API compatibility:
> > > >>>>
> > > >>>> An API is any public class or interface exposed in Spark that is
> not
> > > >>>> marked as semi-private or experimental. Release A is API
> compatible
> > > >>>> with release B if code compiled against release A *compiles
> cleanly*
> > > >>>> against B. This does not guarantee that a compiled application
> that
> > is
> > > >>>> linked against version A will link cleanly against version B
> without
> > > >>>> re-compiling. Link-level compatibility is something we'll try to
> > > >>>> guarantee that as well, and we might make it a requirement in the
> > > >>>> future, but challenges with things like Scala versions have made
> > this
> > > >>>> difficult to guarantee in the past.
> > > >>>>
> > > >>>> == Merging Pull Requests ==
> > > >>>> To merge pull requests, committers are encouraged to use this tool
> > [2]
> > > >>>> to collapse the request into one commit rather than manually
> > > >>>> performing git merges. It will also format the commit message
> nicely
> > > >>>> in a way that can be easily parsed later when writing credits.
> > > >>>> Currently it is maintained in a public utility repository, but
> we'll
> > > >>>> merge it into mainline Spark soon.
> > > >>>>
> > > >>>> [2]
> > > >>>
> > https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> > > >>>>
> > > >>>> == Tentative Release Window for 1.0.0 ==
> > > >>>> Feb 1st - April 1st: General development
> > > >>>> April 1st: Code freeze for new features
> > > >>>> April 15th: RC1
> > > >>>>
> > > >>>> == Deviations ==
> > > >>>> For now, the proposal is to consider these tentative guidelines.
> We
> > > >>>> can vote to formalize these as project rules at a later time after
> > > >>>> some experience working with them. Once formalized, any deviation
> to
> > > >>>> these guidelines will be subject to a lazy majority vote.
> > > >>>>
> > > >>>> - Patrick
> > > >>>
> > >
> > >
> >
>

Re: Proposal for Spark Release Strategy

Posted by Sandy Ryza <sa...@cloudera.com>.
If the APIs are usable, stability and continuity are much more important
than perfection.  With many already relying on the current APIs, I think
trying to clean them up will just cause pain for users and integrators.
 Hadoop made this mistake when they decided the original MapReduce APIs
were ugly and introduced a new set of APIs to do the same thing.  Even
though this happened in a pre-1.0 release, three years down the road, both
the old and new APIs are still supported, causing endless confusion for
users.  If individual functions or configuration properties have unclear
names, they can be deprecated and replaced, but redoing the APIs or
breaking compatibility at this point is simply not worth it.


On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid <im...@quantifind.com> wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
>
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?
> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)
> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?
>
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
>
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
>
>
>
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <matei.zaharia@gmail.com
> >wrote:
>
> > I think it's important to do 1.0 next. The project has been around for 4
> > years, and I'd be comfortable maintaining the current codebase for a long
> > time in an API and binary compatible way through 1.x releases. Over the
> > past 4 years we haven't actually had major changes to the user-facing
> API --
> > the only ones were changing the package to org.apache.spark, and
> upgrading
> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> > two things: it tells users that they can be confident that version will
> be
> > maintained for a long time, which we absolutely want to do, and it lets
> > outsiders see that the project is now fairly mature (for many people,
> > pre-1.0 might still cause them not to try it). I think both are good for
> > the community.
> >
> > Regarding binary compatibility, I agree that it's what we should strive
> > for, but it just seems premature to codify now. Let's see how it works
> > between, say, 1.0 and 1.1, and then we can codify it.
> >
> > Matei
> >
> > On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
> > wrote:
> >
> > > Thanks Patick to initiate the discussion about next road map for Apache
> > Spark.
> > >
> > > I am +1 for 0.10.0 for next version.
> > >
> > > It will give us as community some time to digest the process and the
> > > vision and make adjustment accordingly.
> > >
> > > Release a 1.0.0 is a huge milestone and if we do need to break API
> > > somehow or modify internal behavior dramatically we could take
> > > advantage to release 1.0.0 as good step to go to.
> > >
> > >
> > > - Henry
> > >
> > >
> > >
> > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
> wrote:
> > >> Agree on timeboxed releases as well.
> > >>
> > >> Is there a vision for where we want to be as a project before
> declaring
> > the
> > >> first 1.0 release?  While we're in the 0.x days per semver we can
> break
> > >> backcompat at will (though we try to avoid it where possible), and
> that
> > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> > >> because it seems to follow after 0.9 rather than making an intentional
> > >> decision that we're at the point where we can stand by the current
> APIs
> > and
> > >> binary compatibility for the next year or so of the major release.
> > >>
> > >> Until that decision is made as a group I'd rather we do an immediate
> > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> > later,
> > >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> > >> but not the other way around.
> > >>
> > >> https://github.com/apache/incubator-spark/pull/542
> > >>
> > >> Cheers!
> > >> Andrew
> > >>
> > >>
> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
> > >wrote:
> > >>
> > >>> +1 on time boxed releases and compatibility guidelines
> > >>>
> > >>>
> > >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
> >:
> > >>>>
> > >>>> Hi Everyone,
> > >>>>
> > >>>> In an effort to coordinate development amongst the growing list of
> > >>>> Spark contributors, I've taken some time to write up a proposal to
> > >>>> formalize various pieces of the development process. The next
> release
> > >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
> > >>>> part to coordinate the release plan for 1.0.0 and future releases.
> > >>>> I'll post this on the wiki after discussing it on this thread as
> > >>>> tentative project guidelines.
> > >>>>
> > >>>> == Spark Release Structure ==
> > >>>> Starting with Spark 1.0.0, the Spark project will follow the
> semantic
> > >>>> versioning guidelines (http://semver.org/) with a few deviations.
> > >>>> These small differences account for Spark's nature as a multi-module
> > >>>> project.
> > >>>>
> > >>>> Each Spark release will be versioned:
> > >>>> [MAJOR].[MINOR].[MAINTENANCE]
> > >>>>
> > >>>> All releases with the same major version number will have API
> > >>>> compatibility, defined as [1]. Major version numbers will remain
> > >>>> stable over long periods of time. For instance, 1.X.Y may last 1
> year
> > >>>> or more.
> > >>>>
> > >>>> Minor releases will typically contain new features and improvements.
> > >>>> The target frequency for minor releases is every 3-4 months. One
> > >>>> change we'd like to make is to announce fixed release dates and
> merge
> > >>>> windows for each release, to facilitate coordination. Each minor
> > >>>> release will have a merge window where new patches can be merged, a
> QA
> > >>>> window when only fixes can be merged, then a final period where
> voting
> > >>>> occurs on release candidates. These windows will be announced
> > >>>> immediately after the previous minor release to give people plenty
> of
> > >>>> time, and over time, we might make the whole release process more
> > >>>> regular (similar to Ubuntu). At the bottom of this document is an
> > >>>> example window for the 1.0.0 release.
> > >>>>
> > >>>> Maintenance releases will occur more frequently and depend on
> specific
> > >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> > >>>> these releases are designed to patch bugs. However, higher level
> > >>>> libraries may introduce small features, such as a new algorithm,
> > >>>> provided they are entirely additive and isolated from existing code
> > >>>> paths. Spark core may not introduce any features.
> > >>>>
> > >>>> When new components are added to Spark, they may initially be marked
> > >>>> as "alpha". Alpha components do not have to abide by the above
> > >>>> guidelines, however, to the maximum extent possible, they should try
> > >>>> to. Once they are marked "stable" they have to follow these
> > >>>> guidelines. At present, GraphX is the only alpha component of Spark.
> > >>>>
> > >>>> [1] API compatibility:
> > >>>>
> > >>>> An API is any public class or interface exposed in Spark that is not
> > >>>> marked as semi-private or experimental. Release A is API compatible
> > >>>> with release B if code compiled against release A *compiles cleanly*
> > >>>> against B. This does not guarantee that a compiled application that
> is
> > >>>> linked against version A will link cleanly against version B without
> > >>>> re-compiling. Link-level compatibility is something we'll try to
> > >>>> guarantee that as well, and we might make it a requirement in the
> > >>>> future, but challenges with things like Scala versions have made
> this
> > >>>> difficult to guarantee in the past.
> > >>>>
> > >>>> == Merging Pull Requests ==
> > >>>> To merge pull requests, committers are encouraged to use this tool
> [2]
> > >>>> to collapse the request into one commit rather than manually
> > >>>> performing git merges. It will also format the commit message nicely
> > >>>> in a way that can be easily parsed later when writing credits.
> > >>>> Currently it is maintained in a public utility repository, but we'll
> > >>>> merge it into mainline Spark soon.
> > >>>>
> > >>>> [2]
> > >>>
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> > >>>>
> > >>>> == Tentative Release Window for 1.0.0 ==
> > >>>> Feb 1st - April 1st: General development
> > >>>> April 1st: Code freeze for new features
> > >>>> April 15th: RC1
> > >>>>
> > >>>> == Deviations ==
> > >>>> For now, the proposal is to consider these tentative guidelines. We
> > >>>> can vote to formalize these as project rules at a later time after
> > >>>> some experience working with them. Once formalized, any deviation to
> > >>>> these guidelines will be subject to a lazy majority vote.
> > >>>>
> > >>>> - Patrick
> > >>>
> >
> >
>

Re: Proposal for Spark Release Strategy

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Imran:

> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.


And moving master to 1.0.0-SNAPSHOT doesn't preclude that.  If anything, it
turns that "ought to" into "must" -- which is another way of saying what
Reynold said: "The point of 1.0 is for us to self-enforce API compatibility
in the context of longer term support. If we continue down the 0.xx road,
we will always have excuse for breaking APIs."

1.0.0-SNAPSHOT doesn't mean that the API is final right now.  It means that
what is released next will be final over what is intended to be the lengthy
scope of a major release.  That means that adding new features and
functionality (at least to core spark) should be a very low priority for
this development cycle, and establishing the 1.0 API from what is already
in 0.9.0 should be our first priority.  It wouldn't trouble me at all if
not-strictly-necessary new features were left to hang out on the pull
request queue for quite awhile until we are ready to add them in 1.1.0, if
we were to do pretty much nothing else during this cycle except to get the
1.0 API to where most of us agree that it is in good shape.

If we're not adding new features and extending the 0.9.0 API, then there
really is no need for a 0.10.0 minor-release, whose main purpose would be
to collect the API additions from 0.9.0.  Bug-fixes go in 0.9.1-SNAPSHOT;
bug-fixes and finalized 1.0 API go in 1.0.0-SNAPSHOT; almost all new
features are put on hold and wait for 1.1.0-SNAPSHOT.

... it seems possible that there could be new features we'd like to release
> in 0.10...


We certainly can add new features to 1.0.0, but they will have to go
through a rigorous review to be certain that they are things that we really
want to commit to keeping going forward.  But after 1.0, that is true for
any new feature proposal unless we create specifically experimental
branches.  So what moving to 1.0.0-SNAPSHOT really means is that we are
saying that we have gone beyond the development phase where more-or-less
experimental features can be added to Spark releases only to be withdrawn
later -- that time is done after 1.0.0-SNAPSHOT.  Now to be fair,
tentative/experimental features have not been added willy-nilly to Spark
over recent releases, and withdrawal/replacement has been about as limited
in scope as could be fairly expected, so this shouldn't be a radically new
and different development paradigm.  There are, though, some experiments
that were added in the past and should probably now be withdrawn (or at
least deprecated in 1.0.0, withdrawn in 1.1.0.)  I'll put my own
contribution of mapWith, filterWith, et. al on the chopping block as an
effort that, at least in its present form, doesn't provide enough extra
over mapPartitionsWithIndex, and whose syntax is awkward enough that I
don't believe these methods have ever been widely used, so that their
inclusion in the 1.0 API is probably not warranted.

There are other elements of Spark that also should be culled and/or
refactored before 1.0.  Imran has listed a few. I'll also suggest that
there are at least parts of alternative Broadcast variable implementations
that should probably be left behind.  In any event, Imran is absolutely
correct that we need to have a discussion about these issues.  Moving to
1.0.0-SNAPSHOT forces us to begin that discussion.

So, I'm +1 for 1.0.0-incubating-SNAPSHOT (and looking forward to losing the
"incubating"!)




On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid <im...@quantifind.com> wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
>
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?
> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)
> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?
>
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
>
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
>
>
>
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <matei.zaharia@gmail.com
> >wrote:
>
> > I think it's important to do 1.0 next. The project has been around for 4
> > years, and I'd be comfortable maintaining the current codebase for a long
> > time in an API and binary compatible way through 1.x releases. Over the
> > past 4 years we haven't actually had major changes to the user-facing
> API --
> > the only ones were changing the package to org.apache.spark, and
> upgrading
> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> > two things: it tells users that they can be confident that version will
> be
> > maintained for a long time, which we absolutely want to do, and it lets
> > outsiders see that the project is now fairly mature (for many people,
> > pre-1.0 might still cause them not to try it). I think both are good for
> > the community.
> >
> > Regarding binary compatibility, I agree that it's what we should strive
> > for, but it just seems premature to codify now. Let's see how it works
> > between, say, 1.0 and 1.1, and then we can codify it.
> >
> > Matei
> >
> > On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
> > wrote:
> >
> > > Thanks Patick to initiate the discussion about next road map for Apache
> > Spark.
> > >
> > > I am +1 for 0.10.0 for next version.
> > >
> > > It will give us as community some time to digest the process and the
> > > vision and make adjustment accordingly.
> > >
> > > Release a 1.0.0 is a huge milestone and if we do need to break API
> > > somehow or modify internal behavior dramatically we could take
> > > advantage to release 1.0.0 as good step to go to.
> > >
> > >
> > > - Henry
> > >
> > >
> > >
> > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
> wrote:
> > >> Agree on timeboxed releases as well.
> > >>
> > >> Is there a vision for where we want to be as a project before
> declaring
> > the
> > >> first 1.0 release?  While we're in the 0.x days per semver we can
> break
> > >> backcompat at will (though we try to avoid it where possible), and
> that
> > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> > >> because it seems to follow after 0.9 rather than making an intentional
> > >> decision that we're at the point where we can stand by the current
> APIs
> > and
> > >> binary compatibility for the next year or so of the major release.
> > >>
> > >> Until that decision is made as a group I'd rather we do an immediate
> > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> > later,
> > >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> > >> but not the other way around.
> > >>
> > >> https://github.com/apache/incubator-spark/pull/542
> > >>
> > >> Cheers!
> > >> Andrew
> > >>
> > >>
> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
> > >wrote:
> > >>
> > >>> +1 on time boxed releases and compatibility guidelines
> > >>>
> > >>>
> > >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
> >:
> > >>>>
> > >>>> Hi Everyone,
> > >>>>
> > >>>> In an effort to coordinate development amongst the growing list of
> > >>>> Spark contributors, I've taken some time to write up a proposal to
> > >>>> formalize various pieces of the development process. The next
> release
> > >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
> > >>>> part to coordinate the release plan for 1.0.0 and future releases.
> > >>>> I'll post this on the wiki after discussing it on this thread as
> > >>>> tentative project guidelines.
> > >>>>
> > >>>> == Spark Release Structure ==
> > >>>> Starting with Spark 1.0.0, the Spark project will follow the
> semantic
> > >>>> versioning guidelines (http://semver.org/) with a few deviations.
> > >>>> These small differences account for Spark's nature as a multi-module
> > >>>> project.
> > >>>>
> > >>>> Each Spark release will be versioned:
> > >>>> [MAJOR].[MINOR].[MAINTENANCE]
> > >>>>
> > >>>> All releases with the same major version number will have API
> > >>>> compatibility, defined as [1]. Major version numbers will remain
> > >>>> stable over long periods of time. For instance, 1.X.Y may last 1
> year
> > >>>> or more.
> > >>>>
> > >>>> Minor releases will typically contain new features and improvements.
> > >>>> The target frequency for minor releases is every 3-4 months. One
> > >>>> change we'd like to make is to announce fixed release dates and
> merge
> > >>>> windows for each release, to facilitate coordination. Each minor
> > >>>> release will have a merge window where new patches can be merged, a
> QA
> > >>>> window when only fixes can be merged, then a final period where
> voting
> > >>>> occurs on release candidates. These windows will be announced
> > >>>> immediately after the previous minor release to give people plenty
> of
> > >>>> time, and over time, we might make the whole release process more
> > >>>> regular (similar to Ubuntu). At the bottom of this document is an
> > >>>> example window for the 1.0.0 release.
> > >>>>
> > >>>> Maintenance releases will occur more frequently and depend on
> specific
> > >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> > >>>> these releases are designed to patch bugs. However, higher level
> > >>>> libraries may introduce small features, such as a new algorithm,
> > >>>> provided they are entirely additive and isolated from existing code
> > >>>> paths. Spark core may not introduce any features.
> > >>>>
> > >>>> When new components are added to Spark, they may initially be marked
> > >>>> as "alpha". Alpha components do not have to abide by the above
> > >>>> guidelines, however, to the maximum extent possible, they should try
> > >>>> to. Once they are marked "stable" they have to follow these
> > >>>> guidelines. At present, GraphX is the only alpha component of Spark.
> > >>>>
> > >>>> [1] API compatibility:
> > >>>>
> > >>>> An API is any public class or interface exposed in Spark that is not
> > >>>> marked as semi-private or experimental. Release A is API compatible
> > >>>> with release B if code compiled against release A *compiles cleanly*
> > >>>> against B. This does not guarantee that a compiled application that
> is
> > >>>> linked against version A will link cleanly against version B without
> > >>>> re-compiling. Link-level compatibility is something we'll try to
> > >>>> guarantee that as well, and we might make it a requirement in the
> > >>>> future, but challenges with things like Scala versions have made
> this
> > >>>> difficult to guarantee in the past.
> > >>>>
> > >>>> == Merging Pull Requests ==
> > >>>> To merge pull requests, committers are encouraged to use this tool
> [2]
> > >>>> to collapse the request into one commit rather than manually
> > >>>> performing git merges. It will also format the commit message nicely
> > >>>> in a way that can be easily parsed later when writing credits.
> > >>>> Currently it is maintained in a public utility repository, but we'll
> > >>>> merge it into mainline Spark soon.
> > >>>>
> > >>>> [2]
> > >>>
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> > >>>>
> > >>>> == Tentative Release Window for 1.0.0 ==
> > >>>> Feb 1st - April 1st: General development
> > >>>> April 1st: Code freeze for new features
> > >>>> April 15th: RC1
> > >>>>
> > >>>> == Deviations ==
> > >>>> For now, the proposal is to consider these tentative guidelines. We
> > >>>> can vote to formalize these as project rules at a later time after
> > >>>> some experience working with them. Once formalized, any deviation to
> > >>>> these guidelines will be subject to a lazy majority vote.
> > >>>>
> > >>>> - Patrick
> > >>>
> >
> >
>

Re: Proposal for Spark Release Strategy

Posted by Matei Zaharia <ma...@gmail.com>.
I think these are good questions to bring up, Imran. Here are my thoughts on them (I’ve thought about some of these in the past):

On Feb 6, 2014, at 12:39 PM, Imran Rashid <im...@quantifind.com> wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
> 
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?

I know that some names are suboptimal, but I absolutely detest breaking APIs, config names, etc. I’ve seen it happen way too often in other projects (even things we depend on that are officially post-1.0, like Akka or Protobuf or Hadoop), and it’s very painful. I think that we as fairly cutting-edge users are okay with libraries occasionally changing, but many others will consider it a show-stopper. Given this, I think that any cosmetic change now, even though it might improve clarity slightly, is not worth the tradeoff in terms of creating an update barrier for existing users.

> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)

Again, for the reason above, I’d keep them where they are and consider adding other stuff later. Also personally I want to optimize the API for usability, not for Spark developers. If it’s easier for a user to call RDD.mapPartitions instead of AdvancedUtils.mapPartitions(rdd, func), and the only cost is a longer RDD.scala class, I’d go for the former. If you think there are some API methods that should just go away, that would be good to discuss — we can deprecate them for example.

> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?

This is an API that will probably be experimental or semi-private at first.

Anyway, as I said, these are good questions — I’d be happy to see suggestions on any of these fronts. I just wanted to point out the importance of compatibility. I think it’s been awesome that most of our users have been able to keep up with the latest version of Spark, getting all the new fixes and simultaneously increasing the amount of contributions we get on master and decreasing the backporting burden on old branches. We might take it for granted, but I’ve seen similar projects that didn't manage to do this. In particular, compatibility in Hadoop has been a mess, with some major users diverging from Apache early (e.g. Facebook) and never being able to contribute back, and with big API cleanups (e.g. mapred -> mapreduce) being proposed after the project already had a lot of momentum and never making it through. The experience of seeing those has made me very conservative. The longer we can keep a unified community, the better it will be for all users of the project.

Matei

> 
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
> 
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
> 
> 
> 
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <ma...@gmail.com>wrote:
> 
>> I think it's important to do 1.0 next. The project has been around for 4
>> years, and I'd be comfortable maintaining the current codebase for a long
>> time in an API and binary compatible way through 1.x releases. Over the
>> past 4 years we haven't actually had major changes to the user-facing API --
>> the only ones were changing the package to org.apache.spark, and upgrading
>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>> two things: it tells users that they can be confident that version will be
>> maintained for a long time, which we absolutely want to do, and it lets
>> outsiders see that the project is now fairly mature (for many people,
>> pre-1.0 might still cause them not to try it). I think both are good for
>> the community.
>> 
>> Regarding binary compatibility, I agree that it's what we should strive
>> for, but it just seems premature to codify now. Let's see how it works
>> between, say, 1.0 and 1.1, and then we can codify it.
>> 
>> Matei
>> 
>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
>> wrote:
>> 
>>> Thanks Patick to initiate the discussion about next road map for Apache
>> Spark.
>>> 
>>> I am +1 for 0.10.0 for next version.
>>> 
>>> It will give us as community some time to digest the process and the
>>> vision and make adjustment accordingly.
>>> 
>>> Release a 1.0.0 is a huge milestone and if we do need to break API
>>> somehow or modify internal behavior dramatically we could take
>>> advantage to release 1.0.0 as good step to go to.
>>> 
>>> 
>>> - Henry
>>> 
>>> 
>>> 
>>> On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com> wrote:
>>>> Agree on timeboxed releases as well.
>>>> 
>>>> Is there a vision for where we want to be as a project before declaring
>> the
>>>> first 1.0 release?  While we're in the 0.x days per semver we can break
>>>> backcompat at will (though we try to avoid it where possible), and that
>>>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>>> because it seems to follow after 0.9 rather than making an intentional
>>>> decision that we're at the point where we can stand by the current APIs
>> and
>>>> binary compatibility for the next year or so of the major release.
>>>> 
>>>> Until that decision is made as a group I'd rather we do an immediate
>>>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>> later,
>>>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>>>> but not the other way around.
>>>> 
>>>> https://github.com/apache/incubator-spark/pull/542
>>>> 
>>>> Cheers!
>>>> Andrew
>>>> 
>>>> 
>>>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>>> wrote:
>>>> 
>>>>> +1 on time boxed releases and compatibility guidelines
>>>>> 
>>>>> 
>>>>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>>>>>> 
>>>>>> Hi Everyone,
>>>>>> 
>>>>>> In an effort to coordinate development amongst the growing list of
>>>>>> Spark contributors, I've taken some time to write up a proposal to
>>>>>> formalize various pieces of the development process. The next release
>>>>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>>>>>> part to coordinate the release plan for 1.0.0 and future releases.
>>>>>> I'll post this on the wiki after discussing it on this thread as
>>>>>> tentative project guidelines.
>>>>>> 
>>>>>> == Spark Release Structure ==
>>>>>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>>>>>> versioning guidelines (http://semver.org/) with a few deviations.
>>>>>> These small differences account for Spark's nature as a multi-module
>>>>>> project.
>>>>>> 
>>>>>> Each Spark release will be versioned:
>>>>>> [MAJOR].[MINOR].[MAINTENANCE]
>>>>>> 
>>>>>> All releases with the same major version number will have API
>>>>>> compatibility, defined as [1]. Major version numbers will remain
>>>>>> stable over long periods of time. For instance, 1.X.Y may last 1 year
>>>>>> or more.
>>>>>> 
>>>>>> Minor releases will typically contain new features and improvements.
>>>>>> The target frequency for minor releases is every 3-4 months. One
>>>>>> change we'd like to make is to announce fixed release dates and merge
>>>>>> windows for each release, to facilitate coordination. Each minor
>>>>>> release will have a merge window where new patches can be merged, a QA
>>>>>> window when only fixes can be merged, then a final period where voting
>>>>>> occurs on release candidates. These windows will be announced
>>>>>> immediately after the previous minor release to give people plenty of
>>>>>> time, and over time, we might make the whole release process more
>>>>>> regular (similar to Ubuntu). At the bottom of this document is an
>>>>>> example window for the 1.0.0 release.
>>>>>> 
>>>>>> Maintenance releases will occur more frequently and depend on specific
>>>>>> patches introduced (e.g. bug fixes) and their urgency. In general
>>>>>> these releases are designed to patch bugs. However, higher level
>>>>>> libraries may introduce small features, such as a new algorithm,
>>>>>> provided they are entirely additive and isolated from existing code
>>>>>> paths. Spark core may not introduce any features.
>>>>>> 
>>>>>> When new components are added to Spark, they may initially be marked
>>>>>> as "alpha". Alpha components do not have to abide by the above
>>>>>> guidelines, however, to the maximum extent possible, they should try
>>>>>> to. Once they are marked "stable" they have to follow these
>>>>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>>>>> 
>>>>>> [1] API compatibility:
>>>>>> 
>>>>>> An API is any public class or interface exposed in Spark that is not
>>>>>> marked as semi-private or experimental. Release A is API compatible
>>>>>> with release B if code compiled against release A *compiles cleanly*
>>>>>> against B. This does not guarantee that a compiled application that is
>>>>>> linked against version A will link cleanly against version B without
>>>>>> re-compiling. Link-level compatibility is something we'll try to
>>>>>> guarantee that as well, and we might make it a requirement in the
>>>>>> future, but challenges with things like Scala versions have made this
>>>>>> difficult to guarantee in the past.
>>>>>> 
>>>>>> == Merging Pull Requests ==
>>>>>> To merge pull requests, committers are encouraged to use this tool [2]
>>>>>> to collapse the request into one commit rather than manually
>>>>>> performing git merges. It will also format the commit message nicely
>>>>>> in a way that can be easily parsed later when writing credits.
>>>>>> Currently it is maintained in a public utility repository, but we'll
>>>>>> merge it into mainline Spark soon.
>>>>>> 
>>>>>> [2]
>>>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>>>>> 
>>>>>> == Tentative Release Window for 1.0.0 ==
>>>>>> Feb 1st - April 1st: General development
>>>>>> April 1st: Code freeze for new features
>>>>>> April 15th: RC1
>>>>>> 
>>>>>> == Deviations ==
>>>>>> For now, the proposal is to consider these tentative guidelines. We
>>>>>> can vote to formalize these as project rules at a later time after
>>>>>> some experience working with them. Once formalized, any deviation to
>>>>>> these guidelines will be subject to a lazy majority vote.
>>>>>> 
>>>>>> - Patrick
>>>>> 
>> 
>> 


Re: Proposal for Spark Release Strategy

Posted by Imran Rashid <im...@quantifind.com>.
I don't really agree with this logic.  I think we haven't broken API so far
because we just keep adding stuff on to it, and we haven't bothered to
clean the api up, specifically to *avoid* breaking things.  Here's a
handful of api breaking things that we might want to consider:

* should we look at all the various configuration properties, and maybe
some of them should get renamed for consistency / clarity?
* do all of the functions on RDD need to be in core?  or do some of them
that are simple additions built on top of the primitives really belong in a
"utils" package or something?  Eg., maybe we should get rid of all the
variants of the mapPartitions / mapWith / etc.  just have map, and
mapPartitionsWithIndex  (too many choices in the api can also be confusing
to the user)
* are the right things getting tracked in SparkListener?  Do we need to add
or remove anything?

This is probably not the right list of questions, that's just an idea of
the kind of thing we should be thinking about.

Its also fine with me if 1.0 is next, I just think that we ought to be
asking these kinds of questions up and down the entire api before we
release 1.0.  And given that we haven't even started that discussion, it
seems possible that there could be new features we'd like to release in
0.10 before that discussion is finished.



On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <ma...@gmail.com>wrote:

> I think it's important to do 1.0 next. The project has been around for 4
> years, and I'd be comfortable maintaining the current codebase for a long
> time in an API and binary compatible way through 1.x releases. Over the
> past 4 years we haven't actually had major changes to the user-facing API --
> the only ones were changing the package to org.apache.spark, and upgrading
> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> two things: it tells users that they can be confident that version will be
> maintained for a long time, which we absolutely want to do, and it lets
> outsiders see that the project is now fairly mature (for many people,
> pre-1.0 might still cause them not to try it). I think both are good for
> the community.
>
> Regarding binary compatibility, I agree that it's what we should strive
> for, but it just seems premature to codify now. Let's see how it works
> between, say, 1.0 and 1.1, and then we can codify it.
>
> Matei
>
> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > Thanks Patick to initiate the discussion about next road map for Apache
> Spark.
> >
> > I am +1 for 0.10.0 for next version.
> >
> > It will give us as community some time to digest the process and the
> > vision and make adjustment accordingly.
> >
> > Release a 1.0.0 is a huge milestone and if we do need to break API
> > somehow or modify internal behavior dramatically we could take
> > advantage to release 1.0.0 as good step to go to.
> >
> >
> > - Henry
> >
> >
> >
> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com> wrote:
> >> Agree on timeboxed releases as well.
> >>
> >> Is there a vision for where we want to be as a project before declaring
> the
> >> first 1.0 release?  While we're in the 0.x days per semver we can break
> >> backcompat at will (though we try to avoid it where possible), and that
> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >> because it seems to follow after 0.9 rather than making an intentional
> >> decision that we're at the point where we can stand by the current APIs
> and
> >> binary compatibility for the next year or so of the major release.
> >>
> >> Until that decision is made as a group I'd rather we do an immediate
> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> >> but not the other way around.
> >>
> >> https://github.com/apache/incubator-spark/pull/542
> >>
> >> Cheers!
> >> Andrew
> >>
> >>
> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
> >wrote:
> >>
> >>> +1 on time boxed releases and compatibility guidelines
> >>>
> >>>
> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
> >>>>
> >>>> Hi Everyone,
> >>>>
> >>>> In an effort to coordinate development amongst the growing list of
> >>>> Spark contributors, I've taken some time to write up a proposal to
> >>>> formalize various pieces of the development process. The next release
> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
> >>>> part to coordinate the release plan for 1.0.0 and future releases.
> >>>> I'll post this on the wiki after discussing it on this thread as
> >>>> tentative project guidelines.
> >>>>
> >>>> == Spark Release Structure ==
> >>>> Starting with Spark 1.0.0, the Spark project will follow the semantic
> >>>> versioning guidelines (http://semver.org/) with a few deviations.
> >>>> These small differences account for Spark's nature as a multi-module
> >>>> project.
> >>>>
> >>>> Each Spark release will be versioned:
> >>>> [MAJOR].[MINOR].[MAINTENANCE]
> >>>>
> >>>> All releases with the same major version number will have API
> >>>> compatibility, defined as [1]. Major version numbers will remain
> >>>> stable over long periods of time. For instance, 1.X.Y may last 1 year
> >>>> or more.
> >>>>
> >>>> Minor releases will typically contain new features and improvements.
> >>>> The target frequency for minor releases is every 3-4 months. One
> >>>> change we'd like to make is to announce fixed release dates and merge
> >>>> windows for each release, to facilitate coordination. Each minor
> >>>> release will have a merge window where new patches can be merged, a QA
> >>>> window when only fixes can be merged, then a final period where voting
> >>>> occurs on release candidates. These windows will be announced
> >>>> immediately after the previous minor release to give people plenty of
> >>>> time, and over time, we might make the whole release process more
> >>>> regular (similar to Ubuntu). At the bottom of this document is an
> >>>> example window for the 1.0.0 release.
> >>>>
> >>>> Maintenance releases will occur more frequently and depend on specific
> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> >>>> these releases are designed to patch bugs. However, higher level
> >>>> libraries may introduce small features, such as a new algorithm,
> >>>> provided they are entirely additive and isolated from existing code
> >>>> paths. Spark core may not introduce any features.
> >>>>
> >>>> When new components are added to Spark, they may initially be marked
> >>>> as "alpha". Alpha components do not have to abide by the above
> >>>> guidelines, however, to the maximum extent possible, they should try
> >>>> to. Once they are marked "stable" they have to follow these
> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
> >>>>
> >>>> [1] API compatibility:
> >>>>
> >>>> An API is any public class or interface exposed in Spark that is not
> >>>> marked as semi-private or experimental. Release A is API compatible
> >>>> with release B if code compiled against release A *compiles cleanly*
> >>>> against B. This does not guarantee that a compiled application that is
> >>>> linked against version A will link cleanly against version B without
> >>>> re-compiling. Link-level compatibility is something we'll try to
> >>>> guarantee that as well, and we might make it a requirement in the
> >>>> future, but challenges with things like Scala versions have made this
> >>>> difficult to guarantee in the past.
> >>>>
> >>>> == Merging Pull Requests ==
> >>>> To merge pull requests, committers are encouraged to use this tool [2]
> >>>> to collapse the request into one commit rather than manually
> >>>> performing git merges. It will also format the commit message nicely
> >>>> in a way that can be easily parsed later when writing credits.
> >>>> Currently it is maintained in a public utility repository, but we'll
> >>>> merge it into mainline Spark soon.
> >>>>
> >>>> [2]
> >>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> >>>>
> >>>> == Tentative Release Window for 1.0.0 ==
> >>>> Feb 1st - April 1st: General development
> >>>> April 1st: Code freeze for new features
> >>>> April 15th: RC1
> >>>>
> >>>> == Deviations ==
> >>>> For now, the proposal is to consider these tentative guidelines. We
> >>>> can vote to formalize these as project rules at a later time after
> >>>> some experience working with them. Once formalized, any deviation to
> >>>> these guidelines will be subject to a lazy majority vote.
> >>>>
> >>>> - Patrick
> >>>
>
>

Re: Proposal for Spark Release Strategy

Posted by Matei Zaharia <ma...@gmail.com>.
On Feb 6, 2014, at 11:04 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> *Would it make sense to put in something that strongly discourages binary
> incompatible changes when possible?

Yes, I like this idea. Let’s just say we’ll strive for this as much as possible and think about codifying it after some experience doing this.

Matei



Re: Proposal for Spark Release Strategy

Posted by Matei Zaharia <ma...@gmail.com>.
On Feb 6, 2014, at 11:56 AM, Evan Chan <ev...@ooyala.com> wrote:

> The other reason for waiting are things like stability.
> 
> It would be great to have as a goal for 1.0.0 that under most heavy
> use scenarios, workers and executors don't just die, which is not true
> today.
> Also, there should be minimal "silent failures" which are difficult to debug.
> 

I think this is orthogonal to the version number. 1.x versions can have bugs — it’s almost unavoidable in the distributed system space. The version number is more about the level of compatibility and support people can expect, which I think is something we want to solidify. Calling it 1.x will also make it more likely that we have long-term maintenance releases, because with the current project, people expect that they have to keep jumping to the latest version. Just as an example, when we did a survey a while back, out of ~100 respondents, all were either on the very latest release or on master (!). I’ve had multiple people ask me about longer-term supported versions (e.g. if I download 1.x now, will it still have maintenance releases a year from now, or will it be left in the dust).

Matei


Re: Proposal for Spark Release Strategy

Posted by Evan Chan <ev...@ooyala.com>.
The other reason for waiting are things like stability.

It would be great to have as a goal for 1.0.0 that under most heavy
use scenarios, workers and executors don't just die, which is not true
today.
Also, there should be minimal "silent failures" which are difficult to debug.

On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan <ev...@ooyala.com> wrote:
> +1 for 0.10.0.
>
> It would give more time to study things (such as the new SparkConf)
> and let the community decide if any breaking API changes are needed.
>
> Also, a +1 for minor revisions not breaking code compatibility,
> including Scala versions.   (I guess this would mean that 1.x would
> stay on Scala 2.10.x)
>
> On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza <sa...@cloudera.com> wrote:
>> Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
>> instead of 0.10.0, not to hammer on the binary compatibility point.
>>
>>
>> On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza <sa...@cloudera.com> wrote:
>>
>>> *Would it make sense to put in something that strongly discourages binary
>>> incompatible changes when possible?
>>>
>>>
>>> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Not codifying binary compatibility as a hard rule sounds fine to me.
>>>>  Would it make sense to put something in that . I.e. avoid making needless
>>>> changes to class hierarchies.
>>>>
>>>> Whether Spark considers itself stable or not, users are beginning to
>>>> treat it so.  A responsible project will acknowledge this and provide the
>>>> stability needed by its user base.  I think some projects have made the
>>>> mistake of waiting too long to release a 1.0.0.  It allows them to put off
>>>> making the hard decisions, but users and downstream projects suffer.
>>>>
>>>> If Spark needs to go through dramatic changes, there's always the option
>>>> of a 2.0.0 that allows for this.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>>
>>>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <ma...@gmail.com>wrote:
>>>>
>>>>> I think it's important to do 1.0 next. The project has been around for 4
>>>>> years, and I'd be comfortable maintaining the current codebase for a long
>>>>> time in an API and binary compatible way through 1.x releases. Over the
>>>>> past 4 years we haven't actually had major changes to the user-facing API --
>>>>> the only ones were changing the package to org.apache.spark, and upgrading
>>>>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>>>>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>>>>> two things: it tells users that they can be confident that version will be
>>>>> maintained for a long time, which we absolutely want to do, and it lets
>>>>> outsiders see that the project is now fairly mature (for many people,
>>>>> pre-1.0 might still cause them not to try it). I think both are good for
>>>>> the community.
>>>>>
>>>>> Regarding binary compatibility, I agree that it's what we should strive
>>>>> for, but it just seems premature to codify now. Let's see how it works
>>>>> between, say, 1.0 and 1.1, and then we can codify it.
>>>>>
>>>>> Matei
>>>>>
>>>>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Thanks Patick to initiate the discussion about next road map for
>>>>> Apache Spark.
>>>>> >
>>>>> > I am +1 for 0.10.0 for next version.
>>>>> >
>>>>> > It will give us as community some time to digest the process and the
>>>>> > vision and make adjustment accordingly.
>>>>> >
>>>>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>>>>> > somehow or modify internal behavior dramatically we could take
>>>>> > advantage to release 1.0.0 as good step to go to.
>>>>> >
>>>>> >
>>>>> > - Henry
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
>>>>> wrote:
>>>>> >> Agree on timeboxed releases as well.
>>>>> >>
>>>>> >> Is there a vision for where we want to be as a project before
>>>>> declaring the
>>>>> >> first 1.0 release?  While we're in the 0.x days per semver we can
>>>>> break
>>>>> >> backcompat at will (though we try to avoid it where possible), and
>>>>> that
>>>>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>>>> >> because it seems to follow after 0.9 rather than making an intentional
>>>>> >> decision that we're at the point where we can stand by the current
>>>>> APIs and
>>>>> >> binary compatibility for the next year or so of the major release.
>>>>> >>
>>>>> >> Until that decision is made as a group I'd rather we do an immediate
>>>>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>>>>> later,
>>>>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>>>>> 1.0
>>>>> >> but not the other way around.
>>>>> >>
>>>>> >> https://github.com/apache/incubator-spark/pull/542
>>>>> >>
>>>>> >> Cheers!
>>>>> >> Andrew
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>>>>> >wrote:
>>>>> >>
>>>>> >>> +1 on time boxed releases and compatibility guidelines
>>>>> >>>
>>>>> >>>
>>>>> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
>>>>> >:
>>>>> >>>>
>>>>> >>>> Hi Everyone,
>>>>> >>>>
>>>>> >>>> In an effort to coordinate development amongst the growing list of
>>>>> >>>> Spark contributors, I've taken some time to write up a proposal to
>>>>> >>>> formalize various pieces of the development process. The next
>>>>> release
>>>>> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>>>>> >>>> part to coordinate the release plan for 1.0.0 and future releases.
>>>>> >>>> I'll post this on the wiki after discussing it on this thread as
>>>>> >>>> tentative project guidelines.
>>>>> >>>>
>>>>> >>>> == Spark Release Structure ==
>>>>> >>>> Starting with Spark 1.0.0, the Spark project will follow the
>>>>> semantic
>>>>> >>>> versioning guidelines (http://semver.org/) with a few deviations.
>>>>> >>>> These small differences account for Spark's nature as a multi-module
>>>>> >>>> project.
>>>>> >>>>
>>>>> >>>> Each Spark release will be versioned:
>>>>> >>>> [MAJOR].[MINOR].[MAINTENANCE]
>>>>> >>>>
>>>>> >>>> All releases with the same major version number will have API
>>>>> >>>> compatibility, defined as [1]. Major version numbers will remain
>>>>> >>>> stable over long periods of time. For instance, 1.X.Y may last 1
>>>>> year
>>>>> >>>> or more.
>>>>> >>>>
>>>>> >>>> Minor releases will typically contain new features and improvements.
>>>>> >>>> The target frequency for minor releases is every 3-4 months. One
>>>>> >>>> change we'd like to make is to announce fixed release dates and
>>>>> merge
>>>>> >>>> windows for each release, to facilitate coordination. Each minor
>>>>> >>>> release will have a merge window where new patches can be merged, a
>>>>> QA
>>>>> >>>> window when only fixes can be merged, then a final period where
>>>>> voting
>>>>> >>>> occurs on release candidates. These windows will be announced
>>>>> >>>> immediately after the previous minor release to give people plenty
>>>>> of
>>>>> >>>> time, and over time, we might make the whole release process more
>>>>> >>>> regular (similar to Ubuntu). At the bottom of this document is an
>>>>> >>>> example window for the 1.0.0 release.
>>>>> >>>>
>>>>> >>>> Maintenance releases will occur more frequently and depend on
>>>>> specific
>>>>> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
>>>>> >>>> these releases are designed to patch bugs. However, higher level
>>>>> >>>> libraries may introduce small features, such as a new algorithm,
>>>>> >>>> provided they are entirely additive and isolated from existing code
>>>>> >>>> paths. Spark core may not introduce any features.
>>>>> >>>>
>>>>> >>>> When new components are added to Spark, they may initially be marked
>>>>> >>>> as "alpha". Alpha components do not have to abide by the above
>>>>> >>>> guidelines, however, to the maximum extent possible, they should try
>>>>> >>>> to. Once they are marked "stable" they have to follow these
>>>>> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>>>> >>>>
>>>>> >>>> [1] API compatibility:
>>>>> >>>>
>>>>> >>>> An API is any public class or interface exposed in Spark that is not
>>>>> >>>> marked as semi-private or experimental. Release A is API compatible
>>>>> >>>> with release B if code compiled against release A *compiles cleanly*
>>>>> >>>> against B. This does not guarantee that a compiled application that
>>>>> is
>>>>> >>>> linked against version A will link cleanly against version B without
>>>>> >>>> re-compiling. Link-level compatibility is something we'll try to
>>>>> >>>> guarantee that as well, and we might make it a requirement in the
>>>>> >>>> future, but challenges with things like Scala versions have made
>>>>> this
>>>>> >>>> difficult to guarantee in the past.
>>>>> >>>>
>>>>> >>>> == Merging Pull Requests ==
>>>>> >>>> To merge pull requests, committers are encouraged to use this tool
>>>>> [2]
>>>>> >>>> to collapse the request into one commit rather than manually
>>>>> >>>> performing git merges. It will also format the commit message nicely
>>>>> >>>> in a way that can be easily parsed later when writing credits.
>>>>> >>>> Currently it is maintained in a public utility repository, but we'll
>>>>> >>>> merge it into mainline Spark soon.
>>>>> >>>>
>>>>> >>>> [2]
>>>>> >>>
>>>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>>>> >>>>
>>>>> >>>> == Tentative Release Window for 1.0.0 ==
>>>>> >>>> Feb 1st - April 1st: General development
>>>>> >>>> April 1st: Code freeze for new features
>>>>> >>>> April 15th: RC1
>>>>> >>>>
>>>>> >>>> == Deviations ==
>>>>> >>>> For now, the proposal is to consider these tentative guidelines. We
>>>>> >>>> can vote to formalize these as project rules at a later time after
>>>>> >>>> some experience working with them. Once formalized, any deviation to
>>>>> >>>> these guidelines will be subject to a lazy majority vote.
>>>>> >>>>
>>>>> >>>> - Patrick
>>>>> >>>
>>>>>
>>>>>
>>>>
>>>
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Re: Proposal for Spark Release Strategy

Posted by Reynold Xin <rx...@databricks.com>.
+1 for 1.0


The point of 1.0 is for us to self-enforce API compatibility in the context
of longer term support. If we continue down the 0.xx road, we will always
have excuse for breaking APIs. That said, a major focus of 0.9 and some of
the work that are happening for 1.0 (e.g. configuration, Java 8 closure
support, security) are for better API compatibility support in 1.x releases.

While not perfect, Spark as is is already more mature than many (ASF)
projects that are versioned 1.x, 2.x, or even 10.x. Software releases are
always a moving target. 1.0 doesn't mean it is "perfect" and "final". The
project will still evolve.




On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan <ev...@ooyala.com> wrote:

> +1 for 0.10.0.
>
> It would give more time to study things (such as the new SparkConf)
> and let the community decide if any breaking API changes are needed.
>
> Also, a +1 for minor revisions not breaking code compatibility,
> including Scala versions.   (I guess this would mean that 1.x would
> stay on Scala 2.10.x)
>
> On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza <sa...@cloudera.com>
> wrote:
> > Bleh, hit send to early again.  My second paragraph was to argue for
> 1.0.0
> > instead of 0.10.0, not to hammer on the binary compatibility point.
> >
> >
> > On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza <sa...@cloudera.com>
> wrote:
> >
> >> *Would it make sense to put in something that strongly discourages
> binary
> >> incompatible changes when possible?
> >>
> >>
> >> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza <sandy.ryza@cloudera.com
> >wrote:
> >>
> >>> Not codifying binary compatibility as a hard rule sounds fine to me.
> >>>  Would it make sense to put something in that . I.e. avoid making
> needless
> >>> changes to class hierarchies.
> >>>
> >>> Whether Spark considers itself stable or not, users are beginning to
> >>> treat it so.  A responsible project will acknowledge this and provide
> the
> >>> stability needed by its user base.  I think some projects have made the
> >>> mistake of waiting too long to release a 1.0.0.  It allows them to put
> off
> >>> making the hard decisions, but users and downstream projects suffer.
> >>>
> >>> If Spark needs to go through dramatic changes, there's always the
> option
> >>> of a 2.0.0 that allows for this.
> >>>
> >>> -Sandy
> >>>
> >>>
> >>>
> >>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <
> matei.zaharia@gmail.com>wrote:
> >>>
> >>>> I think it's important to do 1.0 next. The project has been around
> for 4
> >>>> years, and I'd be comfortable maintaining the current codebase for a
> long
> >>>> time in an API and binary compatible way through 1.x releases. Over
> the
> >>>> past 4 years we haven't actually had major changes to the user-facing
> API --
> >>>> the only ones were changing the package to org.apache.spark, and
> upgrading
> >>>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10
> for
> >>>> example, or later cross-building it for Scala 2.11. Updating to 1.0
> says
> >>>> two things: it tells users that they can be confident that version
> will be
> >>>> maintained for a long time, which we absolutely want to do, and it
> lets
> >>>> outsiders see that the project is now fairly mature (for many people,
> >>>> pre-1.0 might still cause them not to try it). I think both are good
> for
> >>>> the community.
> >>>>
> >>>> Regarding binary compatibility, I agree that it's what we should
> strive
> >>>> for, but it just seems premature to codify now. Let's see how it works
> >>>> between, say, 1.0 and 1.1, and then we can codify it.
> >>>>
> >>>> Matei
> >>>>
> >>>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Thanks Patick to initiate the discussion about next road map for
> >>>> Apache Spark.
> >>>> >
> >>>> > I am +1 for 0.10.0 for next version.
> >>>> >
> >>>> > It will give us as community some time to digest the process and the
> >>>> > vision and make adjustment accordingly.
> >>>> >
> >>>> > Release a 1.0.0 is a huge milestone and if we do need to break API
> >>>> > somehow or modify internal behavior dramatically we could take
> >>>> > advantage to release 1.0.0 as good step to go to.
> >>>> >
> >>>> >
> >>>> > - Henry
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
> >>>> wrote:
> >>>> >> Agree on timeboxed releases as well.
> >>>> >>
> >>>> >> Is there a vision for where we want to be as a project before
> >>>> declaring the
> >>>> >> first 1.0 release?  While we're in the 0.x days per semver we can
> >>>> break
> >>>> >> backcompat at will (though we try to avoid it where possible), and
> >>>> that
> >>>> >> luxury goes away with 1.x  I just don't want to release a 1.0
> simply
> >>>> >> because it seems to follow after 0.9 rather than making an
> intentional
> >>>> >> decision that we're at the point where we can stand by the current
> >>>> APIs and
> >>>> >> binary compatibility for the next year or so of the major release.
> >>>> >>
> >>>> >> Until that decision is made as a group I'd rather we do an
> immediate
> >>>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> >>>> later,
> >>>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10
> to
> >>>> 1.0
> >>>> >> but not the other way around.
> >>>> >>
> >>>> >> https://github.com/apache/incubator-spark/pull/542
> >>>> >>
> >>>> >> Cheers!
> >>>> >> Andrew
> >>>> >>
> >>>> >>
> >>>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <
> ike.braun@googlemail.com
> >>>> >wrote:
> >>>> >>
> >>>> >>> +1 on time boxed releases and compatibility guidelines
> >>>> >>>
> >>>> >>>
> >>>> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <
> pwendell@gmail.com
> >>>> >:
> >>>> >>>>
> >>>> >>>> Hi Everyone,
> >>>> >>>>
> >>>> >>>> In an effort to coordinate development amongst the growing list
> of
> >>>> >>>> Spark contributors, I've taken some time to write up a proposal
> to
> >>>> >>>> formalize various pieces of the development process. The next
> >>>> release
> >>>> >>>> of Spark will likely be Spark 1.0.0, so this message is intended
> in
> >>>> >>>> part to coordinate the release plan for 1.0.0 and future
> releases.
> >>>> >>>> I'll post this on the wiki after discussing it on this thread as
> >>>> >>>> tentative project guidelines.
> >>>> >>>>
> >>>> >>>> == Spark Release Structure ==
> >>>> >>>> Starting with Spark 1.0.0, the Spark project will follow the
> >>>> semantic
> >>>> >>>> versioning guidelines (http://semver.org/) with a few
> deviations.
> >>>> >>>> These small differences account for Spark's nature as a
> multi-module
> >>>> >>>> project.
> >>>> >>>>
> >>>> >>>> Each Spark release will be versioned:
> >>>> >>>> [MAJOR].[MINOR].[MAINTENANCE]
> >>>> >>>>
> >>>> >>>> All releases with the same major version number will have API
> >>>> >>>> compatibility, defined as [1]. Major version numbers will remain
> >>>> >>>> stable over long periods of time. For instance, 1.X.Y may last 1
> >>>> year
> >>>> >>>> or more.
> >>>> >>>>
> >>>> >>>> Minor releases will typically contain new features and
> improvements.
> >>>> >>>> The target frequency for minor releases is every 3-4 months. One
> >>>> >>>> change we'd like to make is to announce fixed release dates and
> >>>> merge
> >>>> >>>> windows for each release, to facilitate coordination. Each minor
> >>>> >>>> release will have a merge window where new patches can be
> merged, a
> >>>> QA
> >>>> >>>> window when only fixes can be merged, then a final period where
> >>>> voting
> >>>> >>>> occurs on release candidates. These windows will be announced
> >>>> >>>> immediately after the previous minor release to give people
> plenty
> >>>> of
> >>>> >>>> time, and over time, we might make the whole release process more
> >>>> >>>> regular (similar to Ubuntu). At the bottom of this document is an
> >>>> >>>> example window for the 1.0.0 release.
> >>>> >>>>
> >>>> >>>> Maintenance releases will occur more frequently and depend on
> >>>> specific
> >>>> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> >>>> >>>> these releases are designed to patch bugs. However, higher level
> >>>> >>>> libraries may introduce small features, such as a new algorithm,
> >>>> >>>> provided they are entirely additive and isolated from existing
> code
> >>>> >>>> paths. Spark core may not introduce any features.
> >>>> >>>>
> >>>> >>>> When new components are added to Spark, they may initially be
> marked
> >>>> >>>> as "alpha". Alpha components do not have to abide by the above
> >>>> >>>> guidelines, however, to the maximum extent possible, they should
> try
> >>>> >>>> to. Once they are marked "stable" they have to follow these
> >>>> >>>> guidelines. At present, GraphX is the only alpha component of
> Spark.
> >>>> >>>>
> >>>> >>>> [1] API compatibility:
> >>>> >>>>
> >>>> >>>> An API is any public class or interface exposed in Spark that is
> not
> >>>> >>>> marked as semi-private or experimental. Release A is API
> compatible
> >>>> >>>> with release B if code compiled against release A *compiles
> cleanly*
> >>>> >>>> against B. This does not guarantee that a compiled application
> that
> >>>> is
> >>>> >>>> linked against version A will link cleanly against version B
> without
> >>>> >>>> re-compiling. Link-level compatibility is something we'll try to
> >>>> >>>> guarantee that as well, and we might make it a requirement in the
> >>>> >>>> future, but challenges with things like Scala versions have made
> >>>> this
> >>>> >>>> difficult to guarantee in the past.
> >>>> >>>>
> >>>> >>>> == Merging Pull Requests ==
> >>>> >>>> To merge pull requests, committers are encouraged to use this
> tool
> >>>> [2]
> >>>> >>>> to collapse the request into one commit rather than manually
> >>>> >>>> performing git merges. It will also format the commit message
> nicely
> >>>> >>>> in a way that can be easily parsed later when writing credits.
> >>>> >>>> Currently it is maintained in a public utility repository, but
> we'll
> >>>> >>>> merge it into mainline Spark soon.
> >>>> >>>>
> >>>> >>>> [2]
> >>>> >>>
> >>>>
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> >>>> >>>>
> >>>> >>>> == Tentative Release Window for 1.0.0 ==
> >>>> >>>> Feb 1st - April 1st: General development
> >>>> >>>> April 1st: Code freeze for new features
> >>>> >>>> April 15th: RC1
> >>>> >>>>
> >>>> >>>> == Deviations ==
> >>>> >>>> For now, the proposal is to consider these tentative guidelines.
> We
> >>>> >>>> can vote to formalize these as project rules at a later time
> after
> >>>> >>>> some experience working with them. Once formalized, any
> deviation to
> >>>> >>>> these guidelines will be subject to a lazy majority vote.
> >>>> >>>>
> >>>> >>>> - Patrick
> >>>> >>>
> >>>>
> >>>>
> >>>
> >>
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |
>

Re: Proposal for Spark Release Strategy

Posted by Evan Chan <ev...@ooyala.com>.
+1 for 0.10.0.

It would give more time to study things (such as the new SparkConf)
and let the community decide if any breaking API changes are needed.

Also, a +1 for minor revisions not breaking code compatibility,
including Scala versions.   (I guess this would mean that 1.x would
stay on Scala 2.10.x)

On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza <sa...@cloudera.com> wrote:
> Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
> instead of 0.10.0, not to hammer on the binary compatibility point.
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza <sa...@cloudera.com> wrote:
>
>> *Would it make sense to put in something that strongly discourages binary
>> incompatible changes when possible?
>>
>>
>> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Not codifying binary compatibility as a hard rule sounds fine to me.
>>>  Would it make sense to put something in that . I.e. avoid making needless
>>> changes to class hierarchies.
>>>
>>> Whether Spark considers itself stable or not, users are beginning to
>>> treat it so.  A responsible project will acknowledge this and provide the
>>> stability needed by its user base.  I think some projects have made the
>>> mistake of waiting too long to release a 1.0.0.  It allows them to put off
>>> making the hard decisions, but users and downstream projects suffer.
>>>
>>> If Spark needs to go through dramatic changes, there's always the option
>>> of a 2.0.0 that allows for this.
>>>
>>> -Sandy
>>>
>>>
>>>
>>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <ma...@gmail.com>wrote:
>>>
>>>> I think it's important to do 1.0 next. The project has been around for 4
>>>> years, and I'd be comfortable maintaining the current codebase for a long
>>>> time in an API and binary compatible way through 1.x releases. Over the
>>>> past 4 years we haven't actually had major changes to the user-facing API --
>>>> the only ones were changing the package to org.apache.spark, and upgrading
>>>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>>>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>>>> two things: it tells users that they can be confident that version will be
>>>> maintained for a long time, which we absolutely want to do, and it lets
>>>> outsiders see that the project is now fairly mature (for many people,
>>>> pre-1.0 might still cause them not to try it). I think both are good for
>>>> the community.
>>>>
>>>> Regarding binary compatibility, I agree that it's what we should strive
>>>> for, but it just seems premature to codify now. Let's see how it works
>>>> between, say, 1.0 and 1.1, and then we can codify it.
>>>>
>>>> Matei
>>>>
>>>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
>>>> wrote:
>>>>
>>>> > Thanks Patick to initiate the discussion about next road map for
>>>> Apache Spark.
>>>> >
>>>> > I am +1 for 0.10.0 for next version.
>>>> >
>>>> > It will give us as community some time to digest the process and the
>>>> > vision and make adjustment accordingly.
>>>> >
>>>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>>>> > somehow or modify internal behavior dramatically we could take
>>>> > advantage to release 1.0.0 as good step to go to.
>>>> >
>>>> >
>>>> > - Henry
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
>>>> wrote:
>>>> >> Agree on timeboxed releases as well.
>>>> >>
>>>> >> Is there a vision for where we want to be as a project before
>>>> declaring the
>>>> >> first 1.0 release?  While we're in the 0.x days per semver we can
>>>> break
>>>> >> backcompat at will (though we try to avoid it where possible), and
>>>> that
>>>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>>> >> because it seems to follow after 0.9 rather than making an intentional
>>>> >> decision that we're at the point where we can stand by the current
>>>> APIs and
>>>> >> binary compatibility for the next year or so of the major release.
>>>> >>
>>>> >> Until that decision is made as a group I'd rather we do an immediate
>>>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>>>> later,
>>>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>>>> 1.0
>>>> >> but not the other way around.
>>>> >>
>>>> >> https://github.com/apache/incubator-spark/pull/542
>>>> >>
>>>> >> Cheers!
>>>> >> Andrew
>>>> >>
>>>> >>
>>>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>>>> >wrote:
>>>> >>
>>>> >>> +1 on time boxed releases and compatibility guidelines
>>>> >>>
>>>> >>>
>>>> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
>>>> >:
>>>> >>>>
>>>> >>>> Hi Everyone,
>>>> >>>>
>>>> >>>> In an effort to coordinate development amongst the growing list of
>>>> >>>> Spark contributors, I've taken some time to write up a proposal to
>>>> >>>> formalize various pieces of the development process. The next
>>>> release
>>>> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>>>> >>>> part to coordinate the release plan for 1.0.0 and future releases.
>>>> >>>> I'll post this on the wiki after discussing it on this thread as
>>>> >>>> tentative project guidelines.
>>>> >>>>
>>>> >>>> == Spark Release Structure ==
>>>> >>>> Starting with Spark 1.0.0, the Spark project will follow the
>>>> semantic
>>>> >>>> versioning guidelines (http://semver.org/) with a few deviations.
>>>> >>>> These small differences account for Spark's nature as a multi-module
>>>> >>>> project.
>>>> >>>>
>>>> >>>> Each Spark release will be versioned:
>>>> >>>> [MAJOR].[MINOR].[MAINTENANCE]
>>>> >>>>
>>>> >>>> All releases with the same major version number will have API
>>>> >>>> compatibility, defined as [1]. Major version numbers will remain
>>>> >>>> stable over long periods of time. For instance, 1.X.Y may last 1
>>>> year
>>>> >>>> or more.
>>>> >>>>
>>>> >>>> Minor releases will typically contain new features and improvements.
>>>> >>>> The target frequency for minor releases is every 3-4 months. One
>>>> >>>> change we'd like to make is to announce fixed release dates and
>>>> merge
>>>> >>>> windows for each release, to facilitate coordination. Each minor
>>>> >>>> release will have a merge window where new patches can be merged, a
>>>> QA
>>>> >>>> window when only fixes can be merged, then a final period where
>>>> voting
>>>> >>>> occurs on release candidates. These windows will be announced
>>>> >>>> immediately after the previous minor release to give people plenty
>>>> of
>>>> >>>> time, and over time, we might make the whole release process more
>>>> >>>> regular (similar to Ubuntu). At the bottom of this document is an
>>>> >>>> example window for the 1.0.0 release.
>>>> >>>>
>>>> >>>> Maintenance releases will occur more frequently and depend on
>>>> specific
>>>> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
>>>> >>>> these releases are designed to patch bugs. However, higher level
>>>> >>>> libraries may introduce small features, such as a new algorithm,
>>>> >>>> provided they are entirely additive and isolated from existing code
>>>> >>>> paths. Spark core may not introduce any features.
>>>> >>>>
>>>> >>>> When new components are added to Spark, they may initially be marked
>>>> >>>> as "alpha". Alpha components do not have to abide by the above
>>>> >>>> guidelines, however, to the maximum extent possible, they should try
>>>> >>>> to. Once they are marked "stable" they have to follow these
>>>> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>>> >>>>
>>>> >>>> [1] API compatibility:
>>>> >>>>
>>>> >>>> An API is any public class or interface exposed in Spark that is not
>>>> >>>> marked as semi-private or experimental. Release A is API compatible
>>>> >>>> with release B if code compiled against release A *compiles cleanly*
>>>> >>>> against B. This does not guarantee that a compiled application that
>>>> is
>>>> >>>> linked against version A will link cleanly against version B without
>>>> >>>> re-compiling. Link-level compatibility is something we'll try to
>>>> >>>> guarantee that as well, and we might make it a requirement in the
>>>> >>>> future, but challenges with things like Scala versions have made
>>>> this
>>>> >>>> difficult to guarantee in the past.
>>>> >>>>
>>>> >>>> == Merging Pull Requests ==
>>>> >>>> To merge pull requests, committers are encouraged to use this tool
>>>> [2]
>>>> >>>> to collapse the request into one commit rather than manually
>>>> >>>> performing git merges. It will also format the commit message nicely
>>>> >>>> in a way that can be easily parsed later when writing credits.
>>>> >>>> Currently it is maintained in a public utility repository, but we'll
>>>> >>>> merge it into mainline Spark soon.
>>>> >>>>
>>>> >>>> [2]
>>>> >>>
>>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>>> >>>>
>>>> >>>> == Tentative Release Window for 1.0.0 ==
>>>> >>>> Feb 1st - April 1st: General development
>>>> >>>> April 1st: Code freeze for new features
>>>> >>>> April 15th: RC1
>>>> >>>>
>>>> >>>> == Deviations ==
>>>> >>>> For now, the proposal is to consider these tentative guidelines. We
>>>> >>>> can vote to formalize these as project rules at a later time after
>>>> >>>> some experience working with them. Once formalized, any deviation to
>>>> >>>> these guidelines will be subject to a lazy majority vote.
>>>> >>>>
>>>> >>>> - Patrick
>>>> >>>
>>>>
>>>>
>>>
>>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Re: Proposal for Spark Release Strategy

Posted by Sandy Ryza <sa...@cloudera.com>.
Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
instead of 0.10.0, not to hammer on the binary compatibility point.


On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> *Would it make sense to put in something that strongly discourages binary
> incompatible changes when possible?
>
>
> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Not codifying binary compatibility as a hard rule sounds fine to me.
>>  Would it make sense to put something in that . I.e. avoid making needless
>> changes to class hierarchies.
>>
>> Whether Spark considers itself stable or not, users are beginning to
>> treat it so.  A responsible project will acknowledge this and provide the
>> stability needed by its user base.  I think some projects have made the
>> mistake of waiting too long to release a 1.0.0.  It allows them to put off
>> making the hard decisions, but users and downstream projects suffer.
>>
>> If Spark needs to go through dramatic changes, there's always the option
>> of a 2.0.0 that allows for this.
>>
>> -Sandy
>>
>>
>>
>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <ma...@gmail.com>wrote:
>>
>>> I think it's important to do 1.0 next. The project has been around for 4
>>> years, and I'd be comfortable maintaining the current codebase for a long
>>> time in an API and binary compatible way through 1.x releases. Over the
>>> past 4 years we haven't actually had major changes to the user-facing API --
>>> the only ones were changing the package to org.apache.spark, and upgrading
>>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>>> two things: it tells users that they can be confident that version will be
>>> maintained for a long time, which we absolutely want to do, and it lets
>>> outsiders see that the project is now fairly mature (for many people,
>>> pre-1.0 might still cause them not to try it). I think both are good for
>>> the community.
>>>
>>> Regarding binary compatibility, I agree that it's what we should strive
>>> for, but it just seems premature to codify now. Let's see how it works
>>> between, say, 1.0 and 1.1, and then we can codify it.
>>>
>>> Matei
>>>
>>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
>>> wrote:
>>>
>>> > Thanks Patick to initiate the discussion about next road map for
>>> Apache Spark.
>>> >
>>> > I am +1 for 0.10.0 for next version.
>>> >
>>> > It will give us as community some time to digest the process and the
>>> > vision and make adjustment accordingly.
>>> >
>>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>>> > somehow or modify internal behavior dramatically we could take
>>> > advantage to release 1.0.0 as good step to go to.
>>> >
>>> >
>>> > - Henry
>>> >
>>> >
>>> >
>>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
>>> wrote:
>>> >> Agree on timeboxed releases as well.
>>> >>
>>> >> Is there a vision for where we want to be as a project before
>>> declaring the
>>> >> first 1.0 release?  While we're in the 0.x days per semver we can
>>> break
>>> >> backcompat at will (though we try to avoid it where possible), and
>>> that
>>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>> >> because it seems to follow after 0.9 rather than making an intentional
>>> >> decision that we're at the point where we can stand by the current
>>> APIs and
>>> >> binary compatibility for the next year or so of the major release.
>>> >>
>>> >> Until that decision is made as a group I'd rather we do an immediate
>>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>>> later,
>>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>>> 1.0
>>> >> but not the other way around.
>>> >>
>>> >> https://github.com/apache/incubator-spark/pull/542
>>> >>
>>> >> Cheers!
>>> >> Andrew
>>> >>
>>> >>
>>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>>> >wrote:
>>> >>
>>> >>> +1 on time boxed releases and compatibility guidelines
>>> >>>
>>> >>>
>>> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pwendell@gmail.com
>>> >:
>>> >>>>
>>> >>>> Hi Everyone,
>>> >>>>
>>> >>>> In an effort to coordinate development amongst the growing list of
>>> >>>> Spark contributors, I've taken some time to write up a proposal to
>>> >>>> formalize various pieces of the development process. The next
>>> release
>>> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>>> >>>> part to coordinate the release plan for 1.0.0 and future releases.
>>> >>>> I'll post this on the wiki after discussing it on this thread as
>>> >>>> tentative project guidelines.
>>> >>>>
>>> >>>> == Spark Release Structure ==
>>> >>>> Starting with Spark 1.0.0, the Spark project will follow the
>>> semantic
>>> >>>> versioning guidelines (http://semver.org/) with a few deviations.
>>> >>>> These small differences account for Spark's nature as a multi-module
>>> >>>> project.
>>> >>>>
>>> >>>> Each Spark release will be versioned:
>>> >>>> [MAJOR].[MINOR].[MAINTENANCE]
>>> >>>>
>>> >>>> All releases with the same major version number will have API
>>> >>>> compatibility, defined as [1]. Major version numbers will remain
>>> >>>> stable over long periods of time. For instance, 1.X.Y may last 1
>>> year
>>> >>>> or more.
>>> >>>>
>>> >>>> Minor releases will typically contain new features and improvements.
>>> >>>> The target frequency for minor releases is every 3-4 months. One
>>> >>>> change we'd like to make is to announce fixed release dates and
>>> merge
>>> >>>> windows for each release, to facilitate coordination. Each minor
>>> >>>> release will have a merge window where new patches can be merged, a
>>> QA
>>> >>>> window when only fixes can be merged, then a final period where
>>> voting
>>> >>>> occurs on release candidates. These windows will be announced
>>> >>>> immediately after the previous minor release to give people plenty
>>> of
>>> >>>> time, and over time, we might make the whole release process more
>>> >>>> regular (similar to Ubuntu). At the bottom of this document is an
>>> >>>> example window for the 1.0.0 release.
>>> >>>>
>>> >>>> Maintenance releases will occur more frequently and depend on
>>> specific
>>> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
>>> >>>> these releases are designed to patch bugs. However, higher level
>>> >>>> libraries may introduce small features, such as a new algorithm,
>>> >>>> provided they are entirely additive and isolated from existing code
>>> >>>> paths. Spark core may not introduce any features.
>>> >>>>
>>> >>>> When new components are added to Spark, they may initially be marked
>>> >>>> as "alpha". Alpha components do not have to abide by the above
>>> >>>> guidelines, however, to the maximum extent possible, they should try
>>> >>>> to. Once they are marked "stable" they have to follow these
>>> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>> >>>>
>>> >>>> [1] API compatibility:
>>> >>>>
>>> >>>> An API is any public class or interface exposed in Spark that is not
>>> >>>> marked as semi-private or experimental. Release A is API compatible
>>> >>>> with release B if code compiled against release A *compiles cleanly*
>>> >>>> against B. This does not guarantee that a compiled application that
>>> is
>>> >>>> linked against version A will link cleanly against version B without
>>> >>>> re-compiling. Link-level compatibility is something we'll try to
>>> >>>> guarantee that as well, and we might make it a requirement in the
>>> >>>> future, but challenges with things like Scala versions have made
>>> this
>>> >>>> difficult to guarantee in the past.
>>> >>>>
>>> >>>> == Merging Pull Requests ==
>>> >>>> To merge pull requests, committers are encouraged to use this tool
>>> [2]
>>> >>>> to collapse the request into one commit rather than manually
>>> >>>> performing git merges. It will also format the commit message nicely
>>> >>>> in a way that can be easily parsed later when writing credits.
>>> >>>> Currently it is maintained in a public utility repository, but we'll
>>> >>>> merge it into mainline Spark soon.
>>> >>>>
>>> >>>> [2]
>>> >>>
>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>> >>>>
>>> >>>> == Tentative Release Window for 1.0.0 ==
>>> >>>> Feb 1st - April 1st: General development
>>> >>>> April 1st: Code freeze for new features
>>> >>>> April 15th: RC1
>>> >>>>
>>> >>>> == Deviations ==
>>> >>>> For now, the proposal is to consider these tentative guidelines. We
>>> >>>> can vote to formalize these as project rules at a later time after
>>> >>>> some experience working with them. Once formalized, any deviation to
>>> >>>> these guidelines will be subject to a lazy majority vote.
>>> >>>>
>>> >>>> - Patrick
>>> >>>
>>>
>>>
>>
>

Re: Proposal for Spark Release Strategy

Posted by Sandy Ryza <sa...@cloudera.com>.
*Would it make sense to put in something that strongly discourages binary
incompatible changes when possible?


On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Not codifying binary compatibility as a hard rule sounds fine to me.
>  Would it make sense to put something in that . I.e. avoid making needless
> changes to class hierarchies.
>
> Whether Spark considers itself stable or not, users are beginning to treat
> it so.  A responsible project will acknowledge this and provide the
> stability needed by its user base.  I think some projects have made the
> mistake of waiting too long to release a 1.0.0.  It allows them to put off
> making the hard decisions, but users and downstream projects suffer.
>
> If Spark needs to go through dramatic changes, there's always the option
> of a 2.0.0 that allows for this.
>
> -Sandy
>
>
>
> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> I think it's important to do 1.0 next. The project has been around for 4
>> years, and I'd be comfortable maintaining the current codebase for a long
>> time in an API and binary compatible way through 1.x releases. Over the
>> past 4 years we haven't actually had major changes to the user-facing API --
>> the only ones were changing the package to org.apache.spark, and upgrading
>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>> two things: it tells users that they can be confident that version will be
>> maintained for a long time, which we absolutely want to do, and it lets
>> outsiders see that the project is now fairly mature (for many people,
>> pre-1.0 might still cause them not to try it). I think both are good for
>> the community.
>>
>> Regarding binary compatibility, I agree that it's what we should strive
>> for, but it just seems premature to codify now. Let's see how it works
>> between, say, 1.0 and 1.1, and then we can codify it.
>>
>> Matei
>>
>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
>> wrote:
>>
>> > Thanks Patick to initiate the discussion about next road map for Apache
>> Spark.
>> >
>> > I am +1 for 0.10.0 for next version.
>> >
>> > It will give us as community some time to digest the process and the
>> > vision and make adjustment accordingly.
>> >
>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>> > somehow or modify internal behavior dramatically we could take
>> > advantage to release 1.0.0 as good step to go to.
>> >
>> >
>> > - Henry
>> >
>> >
>> >
>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com>
>> wrote:
>> >> Agree on timeboxed releases as well.
>> >>
>> >> Is there a vision for where we want to be as a project before
>> declaring the
>> >> first 1.0 release?  While we're in the 0.x days per semver we can break
>> >> backcompat at will (though we try to avoid it where possible), and that
>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> >> because it seems to follow after 0.9 rather than making an intentional
>> >> decision that we're at the point where we can stand by the current
>> APIs and
>> >> binary compatibility for the next year or so of the major release.
>> >>
>> >> Until that decision is made as a group I'd rather we do an immediate
>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>> later,
>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>> 1.0
>> >> but not the other way around.
>> >>
>> >> https://github.com/apache/incubator-spark/pull/542
>> >>
>> >> Cheers!
>> >> Andrew
>> >>
>> >>
>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
>> >wrote:
>> >>
>> >>> +1 on time boxed releases and compatibility guidelines
>> >>>
>> >>>
>> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>> >>>>
>> >>>> Hi Everyone,
>> >>>>
>> >>>> In an effort to coordinate development amongst the growing list of
>> >>>> Spark contributors, I've taken some time to write up a proposal to
>> >>>> formalize various pieces of the development process. The next release
>> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>> >>>> part to coordinate the release plan for 1.0.0 and future releases.
>> >>>> I'll post this on the wiki after discussing it on this thread as
>> >>>> tentative project guidelines.
>> >>>>
>> >>>> == Spark Release Structure ==
>> >>>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>> >>>> versioning guidelines (http://semver.org/) with a few deviations.
>> >>>> These small differences account for Spark's nature as a multi-module
>> >>>> project.
>> >>>>
>> >>>> Each Spark release will be versioned:
>> >>>> [MAJOR].[MINOR].[MAINTENANCE]
>> >>>>
>> >>>> All releases with the same major version number will have API
>> >>>> compatibility, defined as [1]. Major version numbers will remain
>> >>>> stable over long periods of time. For instance, 1.X.Y may last 1 year
>> >>>> or more.
>> >>>>
>> >>>> Minor releases will typically contain new features and improvements.
>> >>>> The target frequency for minor releases is every 3-4 months. One
>> >>>> change we'd like to make is to announce fixed release dates and merge
>> >>>> windows for each release, to facilitate coordination. Each minor
>> >>>> release will have a merge window where new patches can be merged, a
>> QA
>> >>>> window when only fixes can be merged, then a final period where
>> voting
>> >>>> occurs on release candidates. These windows will be announced
>> >>>> immediately after the previous minor release to give people plenty of
>> >>>> time, and over time, we might make the whole release process more
>> >>>> regular (similar to Ubuntu). At the bottom of this document is an
>> >>>> example window for the 1.0.0 release.
>> >>>>
>> >>>> Maintenance releases will occur more frequently and depend on
>> specific
>> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
>> >>>> these releases are designed to patch bugs. However, higher level
>> >>>> libraries may introduce small features, such as a new algorithm,
>> >>>> provided they are entirely additive and isolated from existing code
>> >>>> paths. Spark core may not introduce any features.
>> >>>>
>> >>>> When new components are added to Spark, they may initially be marked
>> >>>> as "alpha". Alpha components do not have to abide by the above
>> >>>> guidelines, however, to the maximum extent possible, they should try
>> >>>> to. Once they are marked "stable" they have to follow these
>> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>> >>>>
>> >>>> [1] API compatibility:
>> >>>>
>> >>>> An API is any public class or interface exposed in Spark that is not
>> >>>> marked as semi-private or experimental. Release A is API compatible
>> >>>> with release B if code compiled against release A *compiles cleanly*
>> >>>> against B. This does not guarantee that a compiled application that
>> is
>> >>>> linked against version A will link cleanly against version B without
>> >>>> re-compiling. Link-level compatibility is something we'll try to
>> >>>> guarantee that as well, and we might make it a requirement in the
>> >>>> future, but challenges with things like Scala versions have made this
>> >>>> difficult to guarantee in the past.
>> >>>>
>> >>>> == Merging Pull Requests ==
>> >>>> To merge pull requests, committers are encouraged to use this tool
>> [2]
>> >>>> to collapse the request into one commit rather than manually
>> >>>> performing git merges. It will also format the commit message nicely
>> >>>> in a way that can be easily parsed later when writing credits.
>> >>>> Currently it is maintained in a public utility repository, but we'll
>> >>>> merge it into mainline Spark soon.
>> >>>>
>> >>>> [2]
>> >>>
>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>> >>>>
>> >>>> == Tentative Release Window for 1.0.0 ==
>> >>>> Feb 1st - April 1st: General development
>> >>>> April 1st: Code freeze for new features
>> >>>> April 15th: RC1
>> >>>>
>> >>>> == Deviations ==
>> >>>> For now, the proposal is to consider these tentative guidelines. We
>> >>>> can vote to formalize these as project rules at a later time after
>> >>>> some experience working with them. Once formalized, any deviation to
>> >>>> these guidelines will be subject to a lazy majority vote.
>> >>>>
>> >>>> - Patrick
>> >>>
>>
>>
>

Re: Proposal for Spark Release Strategy

Posted by Sandy Ryza <sa...@cloudera.com>.
Not codifying binary compatibility as a hard rule sounds fine to me.  Would
it make sense to put something in that . I.e. avoid making needless changes
to class hierarchies.

Whether Spark considers itself stable or not, users are beginning to treat
it so.  A responsible project will acknowledge this and provide the
stability needed by its user base.  I think some projects have made the
mistake of waiting too long to release a 1.0.0.  It allows them to put off
making the hard decisions, but users and downstream projects suffer.

If Spark needs to go through dramatic changes, there's always the option of
a 2.0.0 that allows for this.

-Sandy



On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <ma...@gmail.com>wrote:

> I think it's important to do 1.0 next. The project has been around for 4
> years, and I'd be comfortable maintaining the current codebase for a long
> time in an API and binary compatible way through 1.x releases. Over the
> past 4 years we haven't actually had major changes to the user-facing API --
> the only ones were changing the package to org.apache.spark, and upgrading
> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> two things: it tells users that they can be confident that version will be
> maintained for a long time, which we absolutely want to do, and it lets
> outsiders see that the project is now fairly mature (for many people,
> pre-1.0 might still cause them not to try it). I think both are good for
> the community.
>
> Regarding binary compatibility, I agree that it's what we should strive
> for, but it just seems premature to codify now. Let's see how it works
> between, say, 1.0 and 1.1, and then we can codify it.
>
> Matei
>
> On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > Thanks Patick to initiate the discussion about next road map for Apache
> Spark.
> >
> > I am +1 for 0.10.0 for next version.
> >
> > It will give us as community some time to digest the process and the
> > vision and make adjustment accordingly.
> >
> > Release a 1.0.0 is a huge milestone and if we do need to break API
> > somehow or modify internal behavior dramatically we could take
> > advantage to release 1.0.0 as good step to go to.
> >
> >
> > - Henry
> >
> >
> >
> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com> wrote:
> >> Agree on timeboxed releases as well.
> >>
> >> Is there a vision for where we want to be as a project before declaring
> the
> >> first 1.0 release?  While we're in the 0.x days per semver we can break
> >> backcompat at will (though we try to avoid it where possible), and that
> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >> because it seems to follow after 0.9 rather than making an intentional
> >> decision that we're at the point where we can stand by the current APIs
> and
> >> binary compatibility for the next year or so of the major release.
> >>
> >> Until that decision is made as a group I'd rather we do an immediate
> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> >> but not the other way around.
> >>
> >> https://github.com/apache/incubator-spark/pull/542
> >>
> >> Cheers!
> >> Andrew
> >>
> >>
> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ike.braun@googlemail.com
> >wrote:
> >>
> >>> +1 on time boxed releases and compatibility guidelines
> >>>
> >>>
> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
> >>>>
> >>>> Hi Everyone,
> >>>>
> >>>> In an effort to coordinate development amongst the growing list of
> >>>> Spark contributors, I've taken some time to write up a proposal to
> >>>> formalize various pieces of the development process. The next release
> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
> >>>> part to coordinate the release plan for 1.0.0 and future releases.
> >>>> I'll post this on the wiki after discussing it on this thread as
> >>>> tentative project guidelines.
> >>>>
> >>>> == Spark Release Structure ==
> >>>> Starting with Spark 1.0.0, the Spark project will follow the semantic
> >>>> versioning guidelines (http://semver.org/) with a few deviations.
> >>>> These small differences account for Spark's nature as a multi-module
> >>>> project.
> >>>>
> >>>> Each Spark release will be versioned:
> >>>> [MAJOR].[MINOR].[MAINTENANCE]
> >>>>
> >>>> All releases with the same major version number will have API
> >>>> compatibility, defined as [1]. Major version numbers will remain
> >>>> stable over long periods of time. For instance, 1.X.Y may last 1 year
> >>>> or more.
> >>>>
> >>>> Minor releases will typically contain new features and improvements.
> >>>> The target frequency for minor releases is every 3-4 months. One
> >>>> change we'd like to make is to announce fixed release dates and merge
> >>>> windows for each release, to facilitate coordination. Each minor
> >>>> release will have a merge window where new patches can be merged, a QA
> >>>> window when only fixes can be merged, then a final period where voting
> >>>> occurs on release candidates. These windows will be announced
> >>>> immediately after the previous minor release to give people plenty of
> >>>> time, and over time, we might make the whole release process more
> >>>> regular (similar to Ubuntu). At the bottom of this document is an
> >>>> example window for the 1.0.0 release.
> >>>>
> >>>> Maintenance releases will occur more frequently and depend on specific
> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> >>>> these releases are designed to patch bugs. However, higher level
> >>>> libraries may introduce small features, such as a new algorithm,
> >>>> provided they are entirely additive and isolated from existing code
> >>>> paths. Spark core may not introduce any features.
> >>>>
> >>>> When new components are added to Spark, they may initially be marked
> >>>> as "alpha". Alpha components do not have to abide by the above
> >>>> guidelines, however, to the maximum extent possible, they should try
> >>>> to. Once they are marked "stable" they have to follow these
> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
> >>>>
> >>>> [1] API compatibility:
> >>>>
> >>>> An API is any public class or interface exposed in Spark that is not
> >>>> marked as semi-private or experimental. Release A is API compatible
> >>>> with release B if code compiled against release A *compiles cleanly*
> >>>> against B. This does not guarantee that a compiled application that is
> >>>> linked against version A will link cleanly against version B without
> >>>> re-compiling. Link-level compatibility is something we'll try to
> >>>> guarantee that as well, and we might make it a requirement in the
> >>>> future, but challenges with things like Scala versions have made this
> >>>> difficult to guarantee in the past.
> >>>>
> >>>> == Merging Pull Requests ==
> >>>> To merge pull requests, committers are encouraged to use this tool [2]
> >>>> to collapse the request into one commit rather than manually
> >>>> performing git merges. It will also format the commit message nicely
> >>>> in a way that can be easily parsed later when writing credits.
> >>>> Currently it is maintained in a public utility repository, but we'll
> >>>> merge it into mainline Spark soon.
> >>>>
> >>>> [2]
> >>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> >>>>
> >>>> == Tentative Release Window for 1.0.0 ==
> >>>> Feb 1st - April 1st: General development
> >>>> April 1st: Code freeze for new features
> >>>> April 15th: RC1
> >>>>
> >>>> == Deviations ==
> >>>> For now, the proposal is to consider these tentative guidelines. We
> >>>> can vote to formalize these as project rules at a later time after
> >>>> some experience working with them. Once formalized, any deviation to
> >>>> these guidelines will be subject to a lazy majority vote.
> >>>>
> >>>> - Patrick
> >>>
>
>

Re: Proposal for Spark Release Strategy

Posted by Matei Zaharia <ma...@gmail.com>.
I think it’s important to do 1.0 next. The project has been around for 4 years, and I’d be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven’t actually had major changes to the user-facing API — the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I’d be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community.

Regarding binary compatibility, I agree that it’s what we should strive for, but it just seems premature to codify now. Let’s see how it works between, say, 1.0 and 1.1, and then we can codify it.

Matei

On Feb 6, 2014, at 10:43 AM, Henry Saputra <he...@gmail.com> wrote:

> Thanks Patick to initiate the discussion about next road map for Apache Spark.
> 
> I am +1 for 0.10.0 for next version.
> 
> It will give us as community some time to digest the process and the
> vision and make adjustment accordingly.
> 
> Release a 1.0.0 is a huge milestone and if we do need to break API
> somehow or modify internal behavior dramatically we could take
> advantage to release 1.0.0 as good step to go to.
> 
> 
> - Henry
> 
> 
> 
> On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com> wrote:
>> Agree on timeboxed releases as well.
>> 
>> Is there a vision for where we want to be as a project before declaring the
>> first 1.0 release?  While we're in the 0.x days per semver we can break
>> backcompat at will (though we try to avoid it where possible), and that
>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> because it seems to follow after 0.9 rather than making an intentional
>> decision that we're at the point where we can stand by the current APIs and
>> binary compatibility for the next year or so of the major release.
>> 
>> Until that decision is made as a group I'd rather we do an immediate
>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>> but not the other way around.
>> 
>> https://github.com/apache/incubator-spark/pull/542
>> 
>> Cheers!
>> Andrew
>> 
>> 
>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ik...@googlemail.com>wrote:
>> 
>>> +1 on time boxed releases and compatibility guidelines
>>> 
>>> 
>>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>>>> 
>>>> Hi Everyone,
>>>> 
>>>> In an effort to coordinate development amongst the growing list of
>>>> Spark contributors, I've taken some time to write up a proposal to
>>>> formalize various pieces of the development process. The next release
>>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>>>> part to coordinate the release plan for 1.0.0 and future releases.
>>>> I'll post this on the wiki after discussing it on this thread as
>>>> tentative project guidelines.
>>>> 
>>>> == Spark Release Structure ==
>>>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>>>> versioning guidelines (http://semver.org/) with a few deviations.
>>>> These small differences account for Spark's nature as a multi-module
>>>> project.
>>>> 
>>>> Each Spark release will be versioned:
>>>> [MAJOR].[MINOR].[MAINTENANCE]
>>>> 
>>>> All releases with the same major version number will have API
>>>> compatibility, defined as [1]. Major version numbers will remain
>>>> stable over long periods of time. For instance, 1.X.Y may last 1 year
>>>> or more.
>>>> 
>>>> Minor releases will typically contain new features and improvements.
>>>> The target frequency for minor releases is every 3-4 months. One
>>>> change we'd like to make is to announce fixed release dates and merge
>>>> windows for each release, to facilitate coordination. Each minor
>>>> release will have a merge window where new patches can be merged, a QA
>>>> window when only fixes can be merged, then a final period where voting
>>>> occurs on release candidates. These windows will be announced
>>>> immediately after the previous minor release to give people plenty of
>>>> time, and over time, we might make the whole release process more
>>>> regular (similar to Ubuntu). At the bottom of this document is an
>>>> example window for the 1.0.0 release.
>>>> 
>>>> Maintenance releases will occur more frequently and depend on specific
>>>> patches introduced (e.g. bug fixes) and their urgency. In general
>>>> these releases are designed to patch bugs. However, higher level
>>>> libraries may introduce small features, such as a new algorithm,
>>>> provided they are entirely additive and isolated from existing code
>>>> paths. Spark core may not introduce any features.
>>>> 
>>>> When new components are added to Spark, they may initially be marked
>>>> as "alpha". Alpha components do not have to abide by the above
>>>> guidelines, however, to the maximum extent possible, they should try
>>>> to. Once they are marked "stable" they have to follow these
>>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>>> 
>>>> [1] API compatibility:
>>>> 
>>>> An API is any public class or interface exposed in Spark that is not
>>>> marked as semi-private or experimental. Release A is API compatible
>>>> with release B if code compiled against release A *compiles cleanly*
>>>> against B. This does not guarantee that a compiled application that is
>>>> linked against version A will link cleanly against version B without
>>>> re-compiling. Link-level compatibility is something we'll try to
>>>> guarantee that as well, and we might make it a requirement in the
>>>> future, but challenges with things like Scala versions have made this
>>>> difficult to guarantee in the past.
>>>> 
>>>> == Merging Pull Requests ==
>>>> To merge pull requests, committers are encouraged to use this tool [2]
>>>> to collapse the request into one commit rather than manually
>>>> performing git merges. It will also format the commit message nicely
>>>> in a way that can be easily parsed later when writing credits.
>>>> Currently it is maintained in a public utility repository, but we'll
>>>> merge it into mainline Spark soon.
>>>> 
>>>> [2]
>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>>> 
>>>> == Tentative Release Window for 1.0.0 ==
>>>> Feb 1st - April 1st: General development
>>>> April 1st: Code freeze for new features
>>>> April 15th: RC1
>>>> 
>>>> == Deviations ==
>>>> For now, the proposal is to consider these tentative guidelines. We
>>>> can vote to formalize these as project rules at a later time after
>>>> some experience working with them. Once formalized, any deviation to
>>>> these guidelines will be subject to a lazy majority vote.
>>>> 
>>>> - Patrick
>>> 


Re: Proposal for Spark Release Strategy

Posted by Henry Saputra <he...@gmail.com>.
Thanks Patick to initiate the discussion about next road map for Apache Spark.

I am +1 for 0.10.0 for next version.

It will give us as community some time to digest the process and the
vision and make adjustment accordingly.

Release a 1.0.0 is a huge milestone and if we do need to break API
somehow or modify internal behavior dramatically we could take
advantage to release 1.0.0 as good step to go to.


- Henry



On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <an...@andrewash.com> wrote:
> Agree on timeboxed releases as well.
>
> Is there a vision for where we want to be as a project before declaring the
> first 1.0 release?  While we're in the 0.x days per semver we can break
> backcompat at will (though we try to avoid it where possible), and that
> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> because it seems to follow after 0.9 rather than making an intentional
> decision that we're at the point where we can stand by the current APIs and
> binary compatibility for the next year or so of the major release.
>
> Until that decision is made as a group I'd rather we do an immediate
> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> but not the other way around.
>
> https://github.com/apache/incubator-spark/pull/542
>
> Cheers!
> Andrew
>
>
> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ik...@googlemail.com>wrote:
>
>> +1 on time boxed releases and compatibility guidelines
>>
>>
>> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
>> >
>> > Hi Everyone,
>> >
>> > In an effort to coordinate development amongst the growing list of
>> > Spark contributors, I've taken some time to write up a proposal to
>> > formalize various pieces of the development process. The next release
>> > of Spark will likely be Spark 1.0.0, so this message is intended in
>> > part to coordinate the release plan for 1.0.0 and future releases.
>> > I'll post this on the wiki after discussing it on this thread as
>> > tentative project guidelines.
>> >
>> > == Spark Release Structure ==
>> > Starting with Spark 1.0.0, the Spark project will follow the semantic
>> > versioning guidelines (http://semver.org/) with a few deviations.
>> > These small differences account for Spark's nature as a multi-module
>> > project.
>> >
>> > Each Spark release will be versioned:
>> > [MAJOR].[MINOR].[MAINTENANCE]
>> >
>> > All releases with the same major version number will have API
>> > compatibility, defined as [1]. Major version numbers will remain
>> > stable over long periods of time. For instance, 1.X.Y may last 1 year
>> > or more.
>> >
>> > Minor releases will typically contain new features and improvements.
>> > The target frequency for minor releases is every 3-4 months. One
>> > change we'd like to make is to announce fixed release dates and merge
>> > windows for each release, to facilitate coordination. Each minor
>> > release will have a merge window where new patches can be merged, a QA
>> > window when only fixes can be merged, then a final period where voting
>> > occurs on release candidates. These windows will be announced
>> > immediately after the previous minor release to give people plenty of
>> > time, and over time, we might make the whole release process more
>> > regular (similar to Ubuntu). At the bottom of this document is an
>> > example window for the 1.0.0 release.
>> >
>> > Maintenance releases will occur more frequently and depend on specific
>> > patches introduced (e.g. bug fixes) and their urgency. In general
>> > these releases are designed to patch bugs. However, higher level
>> > libraries may introduce small features, such as a new algorithm,
>> > provided they are entirely additive and isolated from existing code
>> > paths. Spark core may not introduce any features.
>> >
>> > When new components are added to Spark, they may initially be marked
>> > as "alpha". Alpha components do not have to abide by the above
>> > guidelines, however, to the maximum extent possible, they should try
>> > to. Once they are marked "stable" they have to follow these
>> > guidelines. At present, GraphX is the only alpha component of Spark.
>> >
>> > [1] API compatibility:
>> >
>> > An API is any public class or interface exposed in Spark that is not
>> > marked as semi-private or experimental. Release A is API compatible
>> > with release B if code compiled against release A *compiles cleanly*
>> > against B. This does not guarantee that a compiled application that is
>> > linked against version A will link cleanly against version B without
>> > re-compiling. Link-level compatibility is something we'll try to
>> > guarantee that as well, and we might make it a requirement in the
>> > future, but challenges with things like Scala versions have made this
>> > difficult to guarantee in the past.
>> >
>> > == Merging Pull Requests ==
>> > To merge pull requests, committers are encouraged to use this tool [2]
>> > to collapse the request into one commit rather than manually
>> > performing git merges. It will also format the commit message nicely
>> > in a way that can be easily parsed later when writing credits.
>> > Currently it is maintained in a public utility repository, but we'll
>> > merge it into mainline Spark soon.
>> >
>> > [2]
>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>> >
>> > == Tentative Release Window for 1.0.0 ==
>> > Feb 1st - April 1st: General development
>> > April 1st: Code freeze for new features
>> > April 15th: RC1
>> >
>> > == Deviations ==
>> > For now, the proposal is to consider these tentative guidelines. We
>> > can vote to formalize these as project rules at a later time after
>> > some experience working with them. Once formalized, any deviation to
>> > these guidelines will be subject to a lazy majority vote.
>> >
>> > - Patrick
>>

Re: Proposal for Spark Release Strategy

Posted by Andrew Ash <an...@andrewash.com>.
Agree on timeboxed releases as well.

Is there a vision for where we want to be as a project before declaring the
first 1.0 release?  While we're in the 0.x days per semver we can break
backcompat at will (though we try to avoid it where possible), and that
luxury goes away with 1.x  I just don't want to release a 1.0 simply
because it seems to follow after 0.9 rather than making an intentional
decision that we're at the point where we can stand by the current APIs and
binary compatibility for the next year or so of the major release.

Until that decision is made as a group I'd rather we do an immediate
version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
but not the other way around.

https://github.com/apache/incubator-spark/pull/542

Cheers!
Andrew


On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <ik...@googlemail.com>wrote:

> +1 on time boxed releases and compatibility guidelines
>
>
> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
> >
> > Hi Everyone,
> >
> > In an effort to coordinate development amongst the growing list of
> > Spark contributors, I've taken some time to write up a proposal to
> > formalize various pieces of the development process. The next release
> > of Spark will likely be Spark 1.0.0, so this message is intended in
> > part to coordinate the release plan for 1.0.0 and future releases.
> > I'll post this on the wiki after discussing it on this thread as
> > tentative project guidelines.
> >
> > == Spark Release Structure ==
> > Starting with Spark 1.0.0, the Spark project will follow the semantic
> > versioning guidelines (http://semver.org/) with a few deviations.
> > These small differences account for Spark's nature as a multi-module
> > project.
> >
> > Each Spark release will be versioned:
> > [MAJOR].[MINOR].[MAINTENANCE]
> >
> > All releases with the same major version number will have API
> > compatibility, defined as [1]. Major version numbers will remain
> > stable over long periods of time. For instance, 1.X.Y may last 1 year
> > or more.
> >
> > Minor releases will typically contain new features and improvements.
> > The target frequency for minor releases is every 3-4 months. One
> > change we'd like to make is to announce fixed release dates and merge
> > windows for each release, to facilitate coordination. Each minor
> > release will have a merge window where new patches can be merged, a QA
> > window when only fixes can be merged, then a final period where voting
> > occurs on release candidates. These windows will be announced
> > immediately after the previous minor release to give people plenty of
> > time, and over time, we might make the whole release process more
> > regular (similar to Ubuntu). At the bottom of this document is an
> > example window for the 1.0.0 release.
> >
> > Maintenance releases will occur more frequently and depend on specific
> > patches introduced (e.g. bug fixes) and their urgency. In general
> > these releases are designed to patch bugs. However, higher level
> > libraries may introduce small features, such as a new algorithm,
> > provided they are entirely additive and isolated from existing code
> > paths. Spark core may not introduce any features.
> >
> > When new components are added to Spark, they may initially be marked
> > as "alpha". Alpha components do not have to abide by the above
> > guidelines, however, to the maximum extent possible, they should try
> > to. Once they are marked "stable" they have to follow these
> > guidelines. At present, GraphX is the only alpha component of Spark.
> >
> > [1] API compatibility:
> >
> > An API is any public class or interface exposed in Spark that is not
> > marked as semi-private or experimental. Release A is API compatible
> > with release B if code compiled against release A *compiles cleanly*
> > against B. This does not guarantee that a compiled application that is
> > linked against version A will link cleanly against version B without
> > re-compiling. Link-level compatibility is something we'll try to
> > guarantee that as well, and we might make it a requirement in the
> > future, but challenges with things like Scala versions have made this
> > difficult to guarantee in the past.
> >
> > == Merging Pull Requests ==
> > To merge pull requests, committers are encouraged to use this tool [2]
> > to collapse the request into one commit rather than manually
> > performing git merges. It will also format the commit message nicely
> > in a way that can be easily parsed later when writing credits.
> > Currently it is maintained in a public utility repository, but we'll
> > merge it into mainline Spark soon.
> >
> > [2]
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> >
> > == Tentative Release Window for 1.0.0 ==
> > Feb 1st - April 1st: General development
> > April 1st: Code freeze for new features
> > April 15th: RC1
> >
> > == Deviations ==
> > For now, the proposal is to consider these tentative guidelines. We
> > can vote to formalize these as project rules at a later time after
> > some experience working with them. Once formalized, any deviation to
> > these guidelines will be subject to a lazy majority vote.
> >
> > - Patrick
>

Re: Proposal for Spark Release Strategy

Posted by Heiko Braun <ik...@googlemail.com>.
+1 on time boxed releases and compatibility guidelines


> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <pw...@gmail.com>:
> 
> Hi Everyone,
> 
> In an effort to coordinate development amongst the growing list of
> Spark contributors, I've taken some time to write up a proposal to
> formalize various pieces of the development process. The next release
> of Spark will likely be Spark 1.0.0, so this message is intended in
> part to coordinate the release plan for 1.0.0 and future releases.
> I'll post this on the wiki after discussing it on this thread as
> tentative project guidelines.
> 
> == Spark Release Structure ==
> Starting with Spark 1.0.0, the Spark project will follow the semantic
> versioning guidelines (http://semver.org/) with a few deviations.
> These small differences account for Spark's nature as a multi-module
> project.
> 
> Each Spark release will be versioned:
> [MAJOR].[MINOR].[MAINTENANCE]
> 
> All releases with the same major version number will have API
> compatibility, defined as [1]. Major version numbers will remain
> stable over long periods of time. For instance, 1.X.Y may last 1 year
> or more.
> 
> Minor releases will typically contain new features and improvements.
> The target frequency for minor releases is every 3-4 months. One
> change we'd like to make is to announce fixed release dates and merge
> windows for each release, to facilitate coordination. Each minor
> release will have a merge window where new patches can be merged, a QA
> window when only fixes can be merged, then a final period where voting
> occurs on release candidates. These windows will be announced
> immediately after the previous minor release to give people plenty of
> time, and over time, we might make the whole release process more
> regular (similar to Ubuntu). At the bottom of this document is an
> example window for the 1.0.0 release.
> 
> Maintenance releases will occur more frequently and depend on specific
> patches introduced (e.g. bug fixes) and their urgency. In general
> these releases are designed to patch bugs. However, higher level
> libraries may introduce small features, such as a new algorithm,
> provided they are entirely additive and isolated from existing code
> paths. Spark core may not introduce any features.
> 
> When new components are added to Spark, they may initially be marked
> as "alpha". Alpha components do not have to abide by the above
> guidelines, however, to the maximum extent possible, they should try
> to. Once they are marked "stable" they have to follow these
> guidelines. At present, GraphX is the only alpha component of Spark.
> 
> [1] API compatibility:
> 
> An API is any public class or interface exposed in Spark that is not
> marked as semi-private or experimental. Release A is API compatible
> with release B if code compiled against release A *compiles cleanly*
> against B. This does not guarantee that a compiled application that is
> linked against version A will link cleanly against version B without
> re-compiling. Link-level compatibility is something we'll try to
> guarantee that as well, and we might make it a requirement in the
> future, but challenges with things like Scala versions have made this
> difficult to guarantee in the past.
> 
> == Merging Pull Requests ==
> To merge pull requests, committers are encouraged to use this tool [2]
> to collapse the request into one commit rather than manually
> performing git merges. It will also format the commit message nicely
> in a way that can be easily parsed later when writing credits.
> Currently it is maintained in a public utility repository, but we'll
> merge it into mainline Spark soon.
> 
> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> 
> == Tentative Release Window for 1.0.0 ==
> Feb 1st - April 1st: General development
> April 1st: Code freeze for new features
> April 15th: RC1
> 
> == Deviations ==
> For now, the proposal is to consider these tentative guidelines. We
> can vote to formalize these as project rules at a later time after
> some experience working with them. Once formalized, any deviation to
> these guidelines will be subject to a lazy majority vote.
> 
> - Patrick

Re: Proposal for Spark Release Strategy

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Yup, the intended merge level is just a hint, the responsibility still lies
with the committers.  It can be a helpful hint, though.


On Wed, Feb 5, 2014 at 4:55 PM, Patrick Wendell <pw...@gmail.com> wrote:

> > How are Alpha components and higher level libraries which may add small
> > features within a maintenance release going to be marked with that
> status?
> >  Somehow/somewhere within the code itself, as just as some kind of
> external
> > reference?
>
> I think we'd mark alpha features as such in the java/scaladoc. This is
> what scala does with experimental features. Higher level libraries are
> anything that isn't Spark core. Maybe we can formalize this more
> somehow.
>
> We might be able to annotate the new features as experimental if they
> end up in a patch release. This could make it more clear.
>
> >
> > I would strongly encourage that developers submitting pull requests
> include
> > within the description of that PR whether you intend the contribution to
> be
> > mergeable at the maintenance level, minor level, or major level.  That
> will
> > help those of us doing code reviews and merges decide where the code
> should
> > go and how closely to scrutinize the PR for changes that are not
> compatible
> > with the intended release level.
>
> I'd say the default is the minor level. If contributors know it should
> be added in a maintenance release, it's great if they say so. However
> I'd say this is also responsibility with the committers, since
> individual contributors may not know. It will probably be a while
> before major level patches are being merged :P
>

Re: Proposal for Spark Release Strategy

Posted by Patrick Wendell <pw...@gmail.com>.
> How are Alpha components and higher level libraries which may add small
> features within a maintenance release going to be marked with that status?
>  Somehow/somewhere within the code itself, as just as some kind of external
> reference?

I think we'd mark alpha features as such in the java/scaladoc. This is
what scala does with experimental features. Higher level libraries are
anything that isn't Spark core. Maybe we can formalize this more
somehow.

We might be able to annotate the new features as experimental if they
end up in a patch release. This could make it more clear.

>
> I would strongly encourage that developers submitting pull requests include
> within the description of that PR whether you intend the contribution to be
> mergeable at the maintenance level, minor level, or major level.  That will
> help those of us doing code reviews and merges decide where the code should
> go and how closely to scrutinize the PR for changes that are not compatible
> with the intended release level.

I'd say the default is the minor level. If contributors know it should
be added in a maintenance release, it's great if they say so. However
I'd say this is also responsibility with the committers, since
individual contributors may not know. It will probably be a while
before major level patches are being merged :P

Re: Proposal for Spark Release Strategy

Posted by Heiko Braun <ik...@googlemail.com>.
I would even take it further, when it comes to PR's:

- any pr needs to reference a jira
- the pr should be rebased before submitting, to avoid merge commits
- as patrick said: require squashed commits

/heiko




> Am 06.02.2014 um 01:39 schrieb Mark Hamstra <ma...@clearstorydata.com>:
> 
> I would strongly encourage that developers submitting pull requests include
> within the description of that PR whether you intend the contribution to be
> mergeable at the maintenance level, minor level, or major level.  

Re: Proposal for Spark Release Strategy

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Looks good.

One question and one comment:

How are Alpha components and higher level libraries which may add small
features within a maintenance release going to be marked with that status?
 Somehow/somewhere within the code itself, as just as some kind of external
reference?

I would strongly encourage that developers submitting pull requests include
within the description of that PR whether you intend the contribution to be
mergeable at the maintenance level, minor level, or major level.  That will
help those of us doing code reviews and merges decide where the code should
go and how closely to scrutinize the PR for changes that are not compatible
with the intended release level.


On Wed, Feb 5, 2014 at 4:20 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Hi Everyone,
>
> In an effort to coordinate development amongst the growing list of
> Spark contributors, I've taken some time to write up a proposal to
> formalize various pieces of the development process. The next release
> of Spark will likely be Spark 1.0.0, so this message is intended in
> part to coordinate the release plan for 1.0.0 and future releases.
> I'll post this on the wiki after discussing it on this thread as
> tentative project guidelines.
>
> == Spark Release Structure ==
> Starting with Spark 1.0.0, the Spark project will follow the semantic
> versioning guidelines (http://semver.org/) with a few deviations.
> These small differences account for Spark's nature as a multi-module
> project.
>
> Each Spark release will be versioned:
> [MAJOR].[MINOR].[MAINTENANCE]
>
> All releases with the same major version number will have API
> compatibility, defined as [1]. Major version numbers will remain
> stable over long periods of time. For instance, 1.X.Y may last 1 year
> or more.
>
> Minor releases will typically contain new features and improvements.
> The target frequency for minor releases is every 3-4 months. One
> change we'd like to make is to announce fixed release dates and merge
> windows for each release, to facilitate coordination. Each minor
> release will have a merge window where new patches can be merged, a QA
> window when only fixes can be merged, then a final period where voting
> occurs on release candidates. These windows will be announced
> immediately after the previous minor release to give people plenty of
> time, and over time, we might make the whole release process more
> regular (similar to Ubuntu). At the bottom of this document is an
> example window for the 1.0.0 release.
>
> Maintenance releases will occur more frequently and depend on specific
> patches introduced (e.g. bug fixes) and their urgency. In general
> these releases are designed to patch bugs. However, higher level
> libraries may introduce small features, such as a new algorithm,
> provided they are entirely additive and isolated from existing code
> paths. Spark core may not introduce any features.
>
> When new components are added to Spark, they may initially be marked
> as "alpha". Alpha components do not have to abide by the above
> guidelines, however, to the maximum extent possible, they should try
> to. Once they are marked "stable" they have to follow these
> guidelines. At present, GraphX is the only alpha component of Spark.
>
> [1] API compatibility:
>
> An API is any public class or interface exposed in Spark that is not
> marked as semi-private or experimental. Release A is API compatible
> with release B if code compiled against release A *compiles cleanly*
> against B. This does not guarantee that a compiled application that is
> linked against version A will link cleanly against version B without
> re-compiling. Link-level compatibility is something we'll try to
> guarantee that as well, and we might make it a requirement in the
> future, but challenges with things like Scala versions have made this
> difficult to guarantee in the past.
>
> == Merging Pull Requests ==
> To merge pull requests, committers are encouraged to use this tool [2]
> to collapse the request into one commit rather than manually
> performing git merges. It will also format the commit message nicely
> in a way that can be easily parsed later when writing credits.
> Currently it is maintained in a public utility repository, but we'll
> merge it into mainline Spark soon.
>
> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>
> == Tentative Release Window for 1.0.0 ==
> Feb 1st - April 1st: General development
> April 1st: Code freeze for new features
> April 15th: RC1
>
> == Deviations ==
> For now, the proposal is to consider these tentative guidelines. We
> can vote to formalize these as project rules at a later time after
> some experience working with them. Once formalized, any deviation to
> these guidelines will be subject to a lazy majority vote.
>
> - Patrick
>