You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Holden Karau <ho...@pigscanfly.ca> on 2017/03/13 19:06:47 UTC

Should we consider a Spark 2.1.1 release?

Hi Spark Devs,

Spark 2.1 has been out since end of December
<http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
and we've got quite a few fixes merged for 2.1.1
<https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
.

On the Python side one of the things I'd like to see us get out into a
patch release is a packaging fix (now merged) before we upload to PyPI &
Conda, and we also have the normal batch of fixes like toLocalIterator for
large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking
close to in good shape for a 2.1.1 release to submit to CRAN (if I've
miss-spoken my apologies). The two outstanding issues that are being
tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming
could also benefit from a patch release.

What do others think - are there any issues people are actively targeting
for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Should we consider a Spark 2.1.1 release?

Posted by Felix Cheung <fe...@hotmail.com>.
+1
there are a lot of good fixes in overall and we need a release for Python and R packages.


________________________________
From: Holden Karau <ho...@pigscanfly.ca>
Sent: Monday, March 13, 2017 12:06:47 PM
To: Felix Cheung; Shivaram Venkataraman; dev@spark.apache.org
Subject: Should we consider a Spark 2.1.1 release?

Hi Spark Devs,

Spark 2.1 has been out since end of December<http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html> and we've got quite a few fixes merged for 2.1.1<https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>.

On the Python side one of the things I'd like to see us get out into a patch release is a packaging fix (now merged) before we upload to PyPI & Conda, and we also have the normal batch of fixes like toLocalIterator for large DataFrames in PySpark.

I've chatted with Felix & Shivaram who seem to think the R side is looking close to in good shape for a 2.1.1 release to submit to CRAN (if I've miss-spoken my apologies). The two outstanding issues that are being tracked for R are SPARK-18817, SPARK-19237.

Looking at the other components quickly it seems like structured streaming could also benefit from a patch release.

What do others think - are there any issues people are actively targeting for 2.1.1? Is this too early to be considering a patch release?

Cheers,

Holden
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Should we consider a Spark 2.1.1 release?

Posted by Holden Karau <ho...@pigscanfly.ca>.
I think questions around how long the 1.6 series will be supported are
really important, but probably belong in a different thread than the 2.1.1
release discussion.

On Mon, Mar 20, 2017 at 11:34 AM Timur Shenkao <ts...@timshenkao.su> wrote:

> Hello guys,
>
> Spark benefits from stable versions not frequent ones.
> A lot of people still have 1.6.x in production. Those who wants the
> freshest (like me) can always deploy night builts.
> My question is: how long version 1.6 will be supported?
>
>
> On Sunday, March 19, 2017, Holden Karau <ho...@pigscanfly.ca> wrote:
>
> This discussions seems like it might benefit from its own thread as we've
> previously decided to lengthen release cycles but if their are different
> opinions about this it seems unrelated to the specific 2.1.1 release.
>
> On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi Mark,
>
> I appreciate your comment.
>
> My thinking is that the more frequent minor and patch releases the
> more often end users can give them a shot and be part of the bigger
> release cycle for major releases. Spark's an OSS project and we all
> can make mistakes and my thinking is is that the more eyeballs the
> less the number of the mistakes. If we make very fine/minor releases
> often we should be able to attract more people who spend their time on
> testing/verification that eventually contribute to a higher quality of
> Spark.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
> > That doesn't necessarily follow, Jacek. There is a point where too
> frequent
> > releases decrease quality. That is because releases don't come for free
> --
> > each one demands a considerable amount of time from release managers,
> > testers, etc. -- time that would otherwise typically be devoted to
> improving
> > (or at least adding to) the code. And that doesn't even begin to consider
> > the time that needs to be spent putting a new version into a larger
> software
> > distribution or that users need to put in to deploy and use a new
> version.
> > If you have an extremely lightweight deployment cycle, then small, quick
> > releases can make sense; but "lightweight" doesn't really describe a
> Spark
> > release. The concern for excessive overhead is a large part of the
> thinking
> > behind why we stretched out the roadmap to allow longer intervals between
> > scheduled releases. A similar concern does come into play for unscheduled
> > maintenance releases -- but I don't think that that is the forcing
> function
> > at this point: A 2.1.1 release is a good idea.
> >
> > On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <ja...@japila.pl>
> wrote:
> >>
> >> +10000
> >>
> >> More smaller and more frequent releases (so major releases get even more
> >> quality).
> >>
> >> Jacek
> >>
> >> On 13 Mar 2017 8:07 p.m., "Holden Karau" <ho...@pigscanfly.ca> wrote:
> >>>
> >>> Hi Spark Devs,
> >>>
> >>> Spark 2.1 has been out since end of December and we've got quite a few
> >>> fixes merged for 2.1.1.
> >>>
> >>> On the Python side one of the things I'd like to see us get out into a
> >>> patch release is a packaging fix (now merged) before we upload to PyPI
> &
> >>> Conda, and we also have the normal batch of fixes like toLocalIterator
> for
> >>> large DataFrames in PySpark.
> >>>
> >>> I've chatted with Felix & Shivaram who seem to think the R side is
> >>> looking close to in good shape for a 2.1.1 release to submit to CRAN
> (if
> >>> I've miss-spoken my apologies). The two outstanding issues that are
> being
> >>> tracked for R are SPARK-18817, SPARK-19237.
> >>>
> >>> Looking at the other components quickly it seems like structured
> >>> streaming could also benefit from a patch release.
> >>>
> >>> What do others think - are there any issues people are actively
> targeting
> >>> for 2.1.1? Is this too early to be considering a patch release?
> >>>
> >>> Cheers,
> >>>
> >>> Holden
> >>> --
> >>> Cell : 425-233-8271
> >>> Twitter: https://twitter.com/holdenkarau
> >
> >
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
> --
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Should we consider a Spark 2.1.1 release?

Posted by Ted Yu <yu...@gmail.com>.
Timur:
Mind starting a new thread ?

I have the same question as you have. 

> On Mar 20, 2017, at 11:34 AM, Timur Shenkao <ts...@timshenkao.su> wrote:
> 
> Hello guys,
> 
> Spark benefits from stable versions not frequent ones.
> A lot of people still have 1.6.x in production. Those who wants the freshest (like me) can always deploy night builts.
> My question is: how long version 1.6 will be supported? 
> 
> On Sunday, March 19, 2017, Holden Karau <ho...@pigscanfly.ca> wrote:
>> This discussions seems like it might benefit from its own thread as we've previously decided to lengthen release cycles but if their are different opinions about this it seems unrelated to the specific 2.1.1 release.
>> 
>>> On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>> Hi Mark,
>>> 
>>> I appreciate your comment.
>>> 
>>> My thinking is that the more frequent minor and patch releases the
>>> more often end users can give them a shot and be part of the bigger
>>> release cycle for major releases. Spark's an OSS project and we all
>>> can make mistakes and my thinking is is that the more eyeballs the
>>> less the number of the mistakes. If we make very fine/minor releases
>>> often we should be able to attract more people who spend their time on
>>> testing/verification that eventually contribute to a higher quality of
>>> Spark.
>>> 
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>> 
>>> 
>>> On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>> > That doesn't necessarily follow, Jacek. There is a point where too frequent
>>> > releases decrease quality. That is because releases don't come for free --
>>> > each one demands a considerable amount of time from release managers,
>>> > testers, etc. -- time that would otherwise typically be devoted to improving
>>> > (or at least adding to) the code. And that doesn't even begin to consider
>>> > the time that needs to be spent putting a new version into a larger software
>>> > distribution or that users need to put in to deploy and use a new version.
>>> > If you have an extremely lightweight deployment cycle, then small, quick
>>> > releases can make sense; but "lightweight" doesn't really describe a Spark
>>> > release. The concern for excessive overhead is a large part of the thinking
>>> > behind why we stretched out the roadmap to allow longer intervals between
>>> > scheduled releases. A similar concern does come into play for unscheduled
>>> > maintenance releases -- but I don't think that that is the forcing function
>>> > at this point: A 2.1.1 release is a good idea.
>>> >
>>> > On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>> >>
>>> >> +10000
>>> >>
>>> >> More smaller and more frequent releases (so major releases get even more
>>> >> quality).
>>> >>
>>> >> Jacek
>>> >>
>>> >> On 13 Mar 2017 8:07 p.m., "Holden Karau" <ho...@pigscanfly.ca> wrote:
>>> >>>
>>> >>> Hi Spark Devs,
>>> >>>
>>> >>> Spark 2.1 has been out since end of December and we've got quite a few
>>> >>> fixes merged for 2.1.1.
>>> >>>
>>> >>> On the Python side one of the things I'd like to see us get out into a
>>> >>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> >>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> >>> large DataFrames in PySpark.
>>> >>>
>>> >>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> >>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> >>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> >>> tracked for R are SPARK-18817, SPARK-19237.
>>> >>>
>>> >>> Looking at the other components quickly it seems like structured
>>> >>> streaming could also benefit from a patch release.
>>> >>>
>>> >>> What do others think - are there any issues people are actively targeting
>>> >>> for 2.1.1? Is this too early to be considering a patch release?
>>> >>>
>>> >>> Cheers,
>>> >>>
>>> >>> Holden
>>> >>> --
>>> >>> Cell : 425-233-8271
>>> >>> Twitter: https://twitter.com/holdenkarau
>>> >
>>> >
>> 
>> -- 
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau

Re: Should we consider a Spark 2.1.1 release?

Posted by Timur Shenkao <ts...@timshenkao.su>.
Hello guys,

Spark benefits from stable versions not frequent ones.
A lot of people still have 1.6.x in production. Those who wants the
freshest (like me) can always deploy night builts.
My question is: how long version 1.6 will be supported?

On Sunday, March 19, 2017, Holden Karau <ho...@pigscanfly.ca> wrote:

> This discussions seems like it might benefit from its own thread as we've
> previously decided to lengthen release cycles but if their are different
> opinions about this it seems unrelated to the specific 2.1.1 release.
>
> On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <jacek@japila.pl
> <javascript:_e(%7B%7D,'cvml','jacek@japila.pl');>> wrote:
>
>> Hi Mark,
>>
>> I appreciate your comment.
>>
>> My thinking is that the more frequent minor and patch releases the
>> more often end users can give them a shot and be part of the bigger
>> release cycle for major releases. Spark's an OSS project and we all
>> can make mistakes and my thinking is is that the more eyeballs the
>> less the number of the mistakes. If we make very fine/minor releases
>> often we should be able to attract more people who spend their time on
>> testing/verification that eventually contribute to a higher quality of
>> Spark.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <mark@clearstorydata.com
>> <javascript:_e(%7B%7D,'cvml','mark@clearstorydata.com');>> wrote:
>> > That doesn't necessarily follow, Jacek. There is a point where too
>> frequent
>> > releases decrease quality. That is because releases don't come for free
>> --
>> > each one demands a considerable amount of time from release managers,
>> > testers, etc. -- time that would otherwise typically be devoted to
>> improving
>> > (or at least adding to) the code. And that doesn't even begin to
>> consider
>> > the time that needs to be spent putting a new version into a larger
>> software
>> > distribution or that users need to put in to deploy and use a new
>> version.
>> > If you have an extremely lightweight deployment cycle, then small, quick
>> > releases can make sense; but "lightweight" doesn't really describe a
>> Spark
>> > release. The concern for excessive overhead is a large part of the
>> thinking
>> > behind why we stretched out the roadmap to allow longer intervals
>> between
>> > scheduled releases. A similar concern does come into play for
>> unscheduled
>> > maintenance releases -- but I don't think that that is the forcing
>> function
>> > at this point: A 2.1.1 release is a good idea.
>> >
>> > On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <jacek@japila.pl
>> <javascript:_e(%7B%7D,'cvml','jacek@japila.pl');>> wrote:
>> >>
>> >> +10000
>> >>
>> >> More smaller and more frequent releases (so major releases get even
>> more
>> >> quality).
>> >>
>> >> Jacek
>> >>
>> >> On 13 Mar 2017 8:07 p.m., "Holden Karau" <holden@pigscanfly.ca
>> <javascript:_e(%7B%7D,'cvml','holden@pigscanfly.ca');>> wrote:
>> >>>
>> >>> Hi Spark Devs,
>> >>>
>> >>> Spark 2.1 has been out since end of December and we've got quite a few
>> >>> fixes merged for 2.1.1.
>> >>>
>> >>> On the Python side one of the things I'd like to see us get out into a
>> >>> patch release is a packaging fix (now merged) before we upload to
>> PyPI &
>> >>> Conda, and we also have the normal batch of fixes like
>> toLocalIterator for
>> >>> large DataFrames in PySpark.
>> >>>
>> >>> I've chatted with Felix & Shivaram who seem to think the R side is
>> >>> looking close to in good shape for a 2.1.1 release to submit to CRAN
>> (if
>> >>> I've miss-spoken my apologies). The two outstanding issues that are
>> being
>> >>> tracked for R are SPARK-18817, SPARK-19237.
>> >>>
>> >>> Looking at the other components quickly it seems like structured
>> >>> streaming could also benefit from a patch release.
>> >>>
>> >>> What do others think - are there any issues people are actively
>> targeting
>> >>> for 2.1.1? Is this too early to be considering a patch release?
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Holden
>> >>> --
>> >>> Cell : 425-233-8271
>> >>> Twitter: https://twitter.com/holdenkarau
>> >
>> >
>>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>

Re: Should we consider a Spark 2.1.1 release?

Posted by Holden Karau <ho...@pigscanfly.ca>.
This discussions seems like it might benefit from its own thread as we've
previously decided to lengthen release cycles but if their are different
opinions about this it seems unrelated to the specific 2.1.1 release.

On Sun, Mar 19, 2017 at 2:57 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Mark,
>
> I appreciate your comment.
>
> My thinking is that the more frequent minor and patch releases the
> more often end users can give them a shot and be part of the bigger
> release cycle for major releases. Spark's an OSS project and we all
> can make mistakes and my thinking is is that the more eyeballs the
> less the number of the mistakes. If we make very fine/minor releases
> often we should be able to attract more people who spend their time on
> testing/verification that eventually contribute to a higher quality of
> Spark.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
> > That doesn't necessarily follow, Jacek. There is a point where too
> frequent
> > releases decrease quality. That is because releases don't come for free
> --
> > each one demands a considerable amount of time from release managers,
> > testers, etc. -- time that would otherwise typically be devoted to
> improving
> > (or at least adding to) the code. And that doesn't even begin to consider
> > the time that needs to be spent putting a new version into a larger
> software
> > distribution or that users need to put in to deploy and use a new
> version.
> > If you have an extremely lightweight deployment cycle, then small, quick
> > releases can make sense; but "lightweight" doesn't really describe a
> Spark
> > release. The concern for excessive overhead is a large part of the
> thinking
> > behind why we stretched out the roadmap to allow longer intervals between
> > scheduled releases. A similar concern does come into play for unscheduled
> > maintenance releases -- but I don't think that that is the forcing
> function
> > at this point: A 2.1.1 release is a good idea.
> >
> > On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <ja...@japila.pl>
> wrote:
> >>
> >> +10000
> >>
> >> More smaller and more frequent releases (so major releases get even more
> >> quality).
> >>
> >> Jacek
> >>
> >> On 13 Mar 2017 8:07 p.m., "Holden Karau" <ho...@pigscanfly.ca> wrote:
> >>>
> >>> Hi Spark Devs,
> >>>
> >>> Spark 2.1 has been out since end of December and we've got quite a few
> >>> fixes merged for 2.1.1.
> >>>
> >>> On the Python side one of the things I'd like to see us get out into a
> >>> patch release is a packaging fix (now merged) before we upload to PyPI
> &
> >>> Conda, and we also have the normal batch of fixes like toLocalIterator
> for
> >>> large DataFrames in PySpark.
> >>>
> >>> I've chatted with Felix & Shivaram who seem to think the R side is
> >>> looking close to in good shape for a 2.1.1 release to submit to CRAN
> (if
> >>> I've miss-spoken my apologies). The two outstanding issues that are
> being
> >>> tracked for R are SPARK-18817, SPARK-19237.
> >>>
> >>> Looking at the other components quickly it seems like structured
> >>> streaming could also benefit from a patch release.
> >>>
> >>> What do others think - are there any issues people are actively
> targeting
> >>> for 2.1.1? Is this too early to be considering a patch release?
> >>>
> >>> Cheers,
> >>>
> >>> Holden
> >>> --
> >>> Cell : 425-233-8271
> >>> Twitter: https://twitter.com/holdenkarau
> >
> >
>
-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Should we consider a Spark 2.1.1 release?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Mark,

I appreciate your comment.

My thinking is that the more frequent minor and patch releases the
more often end users can give them a shot and be part of the bigger
release cycle for major releases. Spark's an OSS project and we all
can make mistakes and my thinking is is that the more eyeballs the
less the number of the mistakes. If we make very fine/minor releases
often we should be able to attract more people who spend their time on
testing/verification that eventually contribute to a higher quality of
Spark.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Mar 19, 2017 at 10:50 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
> That doesn't necessarily follow, Jacek. There is a point where too frequent
> releases decrease quality. That is because releases don't come for free --
> each one demands a considerable amount of time from release managers,
> testers, etc. -- time that would otherwise typically be devoted to improving
> (or at least adding to) the code. And that doesn't even begin to consider
> the time that needs to be spent putting a new version into a larger software
> distribution or that users need to put in to deploy and use a new version.
> If you have an extremely lightweight deployment cycle, then small, quick
> releases can make sense; but "lightweight" doesn't really describe a Spark
> release. The concern for excessive overhead is a large part of the thinking
> behind why we stretched out the roadmap to allow longer intervals between
> scheduled releases. A similar concern does come into play for unscheduled
> maintenance releases -- but I don't think that that is the forcing function
> at this point: A 2.1.1 release is a good idea.
>
> On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> +10000
>>
>> More smaller and more frequent releases (so major releases get even more
>> quality).
>>
>> Jacek
>>
>> On 13 Mar 2017 8:07 p.m., "Holden Karau" <ho...@pigscanfly.ca> wrote:
>>>
>>> Hi Spark Devs,
>>>
>>> Spark 2.1 has been out since end of December and we've got quite a few
>>> fixes merged for 2.1.1.
>>>
>>> On the Python side one of the things I'd like to see us get out into a
>>> patch release is a packaging fix (now merged) before we upload to PyPI &
>>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>>> large DataFrames in PySpark.
>>>
>>> I've chatted with Felix & Shivaram who seem to think the R side is
>>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>>> I've miss-spoken my apologies). The two outstanding issues that are being
>>> tracked for R are SPARK-18817, SPARK-19237.
>>>
>>> Looking at the other components quickly it seems like structured
>>> streaming could also benefit from a patch release.
>>>
>>> What do others think - are there any issues people are actively targeting
>>> for 2.1.1? Is this too early to be considering a patch release?
>>>
>>> Cheers,
>>>
>>> Holden
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Should we consider a Spark 2.1.1 release?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
That doesn't necessarily follow, Jacek. There is a point where too frequent
releases decrease quality. That is because releases don't come for free --
each one demands a considerable amount of time from release managers,
testers, etc. -- time that would otherwise typically be devoted to
improving (or at least adding to) the code. And that doesn't even begin to
consider the time that needs to be spent putting a new version into a
larger software distribution or that users need to put in to deploy and use
a new version. If you have an extremely lightweight deployment cycle, then
small, quick releases can make sense; but "lightweight" doesn't really
describe a Spark release. The concern for excessive overhead is a large
part of the thinking behind why we stretched out the roadmap to allow
longer intervals between scheduled releases. A similar concern does come
into play for unscheduled maintenance releases -- but I don't think that
that is the forcing function at this point: A 2.1.1 release is a good idea.

On Sun, Mar 19, 2017 at 6:24 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> +10000
>
> More smaller and more frequent releases (so major releases get even more
> quality).
>
> Jacek
>
> On 13 Mar 2017 8:07 p.m., "Holden Karau" <ho...@pigscanfly.ca> wrote:
>
>> Hi Spark Devs,
>>
>> Spark 2.1 has been out since end of December
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
>> and we've got quite a few fixes merged for 2.1.1
>> <https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
>> .
>>
>> On the Python side one of the things I'd like to see us get out into a
>> patch release is a packaging fix (now merged) before we upload to PyPI &
>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>> large DataFrames in PySpark.
>>
>> I've chatted with Felix & Shivaram who seem to think the R side is
>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>> I've miss-spoken my apologies). The two outstanding issues that are being
>> tracked for R are SPARK-18817, SPARK-19237.
>>
>> Looking at the other components quickly it seems like structured
>> streaming could also benefit from a patch release.
>>
>> What do others think - are there any issues people are actively targeting
>> for 2.1.1? Is this too early to be considering a patch release?
>>
>> Cheers,
>>
>> Holden
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>

Re: Should we consider a Spark 2.1.1 release?

Posted by Jacek Laskowski <ja...@japila.pl>.
+10000

More smaller and more frequent releases (so major releases get even more
quality).

Jacek

On 13 Mar 2017 8:07 p.m., "Holden Karau" <ho...@pigscanfly.ca> wrote:

> Hi Spark Devs,
>
> Spark 2.1 has been out since end of December
> <http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
> and we've got quite a few fixes merged for 2.1.1
> <https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
> .
>
> On the Python side one of the things I'd like to see us get out into a
> patch release is a packaging fix (now merged) before we upload to PyPI &
> Conda, and we also have the normal batch of fixes like toLocalIterator for
> large DataFrames in PySpark.
>
> I've chatted with Felix & Shivaram who seem to think the R side is looking
> close to in good shape for a 2.1.1 release to submit to CRAN (if I've
> miss-spoken my apologies). The two outstanding issues that are being
> tracked for R are SPARK-18817, SPARK-19237.
>
> Looking at the other components quickly it seems like structured streaming
> could also benefit from a patch release.
>
> What do others think - are there any issues people are actively targeting
> for 2.1.1? Is this too early to be considering a patch release?
>
> Cheers,
>
> Holden
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Re: Should we consider a Spark 2.1.1 release?

Posted by Nick Pentreath <ni...@gmail.com>.
Spark 1.5.1 had 87 issues fix version 1 month after 1.5.0.

Spark 1.6.1 had 123 issues 2 months after 1.6.0

2.0.1 was larger (317 issues) at 3 months after 2.0.0 - makes sense due to
how large a release it was.

We are at 185 for 2.1.1 and 3 months after (and not released yet so it
could slip further) - so not totally unusual as the release interval has
certainly increased, but in fairness probably a bit later than usual. I'd
say definitely makes sense to cut the RC!



On Thu, 16 Mar 2017 at 02:06 Michael Armbrust <mi...@databricks.com>
wrote:

> Hey Holden,
>
> Thanks for bringing this up!  I think we usually cut patch releases when
> there are enough fixes to justify it.  Sometimes just a few weeks after the
> release.  I guess if we are at 3 months Spark 2.1.0 was a pretty good
> release :)
>
> That said, it is probably time. I was about to start thinking about 2.2 as
> well (we are a little past the posted code-freeze deadline), so I'm happy
> to push the buttons etc (this is a very good description
> <http://spark.apache.org/release-process.html> if you are curious). I
> would love help watching JIRA, posting the burn down on issues and
> shepherding in any critical patches.  Feel free to ping me off-line if you
> like to coordinate.
>
> Unless there are any objections, how about we aim for an RC of 2.1.1 on
> Monday and I'll also plan to cut branch-2.2 then?  (I'll send a separate
> email on this as well).
>
> Michael
>
> On Mon, Mar 13, 2017 at 1:40 PM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
> I'd be happy to do the work of coordinating a 2.1.1 release if that's a
> thing a committer can do (I think the release coordinator for the most
> recent Arrow release was a committer and the final publish step took a PMC
> member to upload but other than that I don't remember any issues).
>
> On Mon, Mar 13, 2017 at 1:05 PM Sean Owen <so...@cloudera.com> wrote:
>
> It seems reasonable to me, in that other x.y.1 releases have followed ~2
> months after the x.y.0 release and it's been about 3 months since 2.1.0.
>
> Related: creating releases is tough work, so I feel kind of bad voting for
> someone else to do that much work. Would it make sense to deputize another
> release manager to help get out just the maintenance releases? this may in
> turn mean maintenance branches last longer. Experienced hands can continue
> to manage new minor and major releases as they require more coordination.
>
> I know most of the release process is written down; I know it's also still
> going to be work to make it 100% documented. Eventually it'll be necessary
> to make sure it's entirely codified anyway.
>
> Not pushing for it myself, just noting I had heard this brought up in side
> conversations before.
>
>
> On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>
> Hi Spark Devs,
>
> Spark 2.1 has been out since end of December
> <http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
> and we've got quite a few fixes merged for 2.1.1
> <https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
> .
>
> On the Python side one of the things I'd like to see us get out into a
> patch release is a packaging fix (now merged) before we upload to PyPI &
> Conda, and we also have the normal batch of fixes like toLocalIterator for
> large DataFrames in PySpark.
>
> I've chatted with Felix & Shivaram who seem to think the R side is looking
> close to in good shape for a 2.1.1 release to submit to CRAN (if I've
> miss-spoken my apologies). The two outstanding issues that are being
> tracked for R are SPARK-18817, SPARK-19237.
>
> Looking at the other components quickly it seems like structured streaming
> could also benefit from a patch release.
>
> What do others think - are there any issues people are actively targeting
> for 2.1.1? Is this too early to be considering a patch release?
>
> Cheers,
>
> Holden
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>
>
>

Re: Should we consider a Spark 2.1.1 release?

Posted by Michael Armbrust <mi...@databricks.com>.
Hey Holden,

Thanks for bringing this up!  I think we usually cut patch releases when
there are enough fixes to justify it.  Sometimes just a few weeks after the
release.  I guess if we are at 3 months Spark 2.1.0 was a pretty good
release :)

That said, it is probably time. I was about to start thinking about 2.2 as
well (we are a little past the posted code-freeze deadline), so I'm happy
to push the buttons etc (this is a very good description
<http://spark.apache.org/release-process.html> if you are curious). I would
love help watching JIRA, posting the burn down on issues and shepherding in
any critical patches.  Feel free to ping me off-line if you like to
coordinate.

Unless there are any objections, how about we aim for an RC of 2.1.1 on
Monday and I'll also plan to cut branch-2.2 then?  (I'll send a separate
email on this as well).

Michael

On Mon, Mar 13, 2017 at 1:40 PM, Holden Karau <ho...@pigscanfly.ca> wrote:

> I'd be happy to do the work of coordinating a 2.1.1 release if that's a
> thing a committer can do (I think the release coordinator for the most
> recent Arrow release was a committer and the final publish step took a PMC
> member to upload but other than that I don't remember any issues).
>
> On Mon, Mar 13, 2017 at 1:05 PM Sean Owen <so...@cloudera.com> wrote:
>
>> It seems reasonable to me, in that other x.y.1 releases have followed ~2
>> months after the x.y.0 release and it's been about 3 months since 2.1.0.
>>
>> Related: creating releases is tough work, so I feel kind of bad voting
>> for someone else to do that much work. Would it make sense to deputize
>> another release manager to help get out just the maintenance releases? this
>> may in turn mean maintenance branches last longer. Experienced hands can
>> continue to manage new minor and major releases as they require more
>> coordination.
>>
>> I know most of the release process is written down; I know it's also
>> still going to be work to make it 100% documented. Eventually it'll be
>> necessary to make sure it's entirely codified anyway.
>>
>> Not pushing for it myself, just noting I had heard this brought up in
>> side conversations before.
>>
>>
>> On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>> Hi Spark Devs,
>>
>> Spark 2.1 has been out since end of December
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
>> and we've got quite a few fixes merged for 2.1.1
>> <https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
>> .
>>
>> On the Python side one of the things I'd like to see us get out into a
>> patch release is a packaging fix (now merged) before we upload to PyPI &
>> Conda, and we also have the normal batch of fixes like toLocalIterator for
>> large DataFrames in PySpark.
>>
>> I've chatted with Felix & Shivaram who seem to think the R side is
>> looking close to in good shape for a 2.1.1 release to submit to CRAN (if
>> I've miss-spoken my apologies). The two outstanding issues that are being
>> tracked for R are SPARK-18817, SPARK-19237.
>>
>> Looking at the other components quickly it seems like structured
>> streaming could also benefit from a patch release.
>>
>> What do others think - are there any issues people are actively targeting
>> for 2.1.1? Is this too early to be considering a patch release?
>>
>> Cheers,
>>
>> Holden
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Re: Should we consider a Spark 2.1.1 release?

Posted by Holden Karau <ho...@pigscanfly.ca>.
I'd be happy to do the work of coordinating a 2.1.1 release if that's a
thing a committer can do (I think the release coordinator for the most
recent Arrow release was a committer and the final publish step took a PMC
member to upload but other than that I don't remember any issues).

On Mon, Mar 13, 2017 at 1:05 PM Sean Owen <so...@cloudera.com> wrote:

> It seems reasonable to me, in that other x.y.1 releases have followed ~2
> months after the x.y.0 release and it's been about 3 months since 2.1.0.
>
> Related: creating releases is tough work, so I feel kind of bad voting for
> someone else to do that much work. Would it make sense to deputize another
> release manager to help get out just the maintenance releases? this may in
> turn mean maintenance branches last longer. Experienced hands can continue
> to manage new minor and major releases as they require more coordination.
>
> I know most of the release process is written down; I know it's also still
> going to be work to make it 100% documented. Eventually it'll be necessary
> to make sure it's entirely codified anyway.
>
> Not pushing for it myself, just noting I had heard this brought up in side
> conversations before.
>
>
> On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>
> Hi Spark Devs,
>
> Spark 2.1 has been out since end of December
> <http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
> and we've got quite a few fixes merged for 2.1.1
> <https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
> .
>
> On the Python side one of the things I'd like to see us get out into a
> patch release is a packaging fix (now merged) before we upload to PyPI &
> Conda, and we also have the normal batch of fixes like toLocalIterator for
> large DataFrames in PySpark.
>
> I've chatted with Felix & Shivaram who seem to think the R side is looking
> close to in good shape for a 2.1.1 release to submit to CRAN (if I've
> miss-spoken my apologies). The two outstanding issues that are being
> tracked for R are SPARK-18817, SPARK-19237.
>
> Looking at the other components quickly it seems like structured streaming
> could also benefit from a patch release.
>
> What do others think - are there any issues people are actively targeting
> for 2.1.1? Is this too early to be considering a patch release?
>
> Cheers,
>
> Holden
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
> --
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Should we consider a Spark 2.1.1 release?

Posted by Sean Owen <so...@cloudera.com>.
It seems reasonable to me, in that other x.y.1 releases have followed ~2
months after the x.y.0 release and it's been about 3 months since 2.1.0.

Related: creating releases is tough work, so I feel kind of bad voting for
someone else to do that much work. Would it make sense to deputize another
release manager to help get out just the maintenance releases? this may in
turn mean maintenance branches last longer. Experienced hands can continue
to manage new minor and major releases as they require more coordination.

I know most of the release process is written down; I know it's also still
going to be work to make it 100% documented. Eventually it'll be necessary
to make sure it's entirely codified anyway.

Not pushing for it myself, just noting I had heard this brought up in side
conversations before.

On Mon, Mar 13, 2017 at 7:07 PM Holden Karau <ho...@pigscanfly.ca> wrote:

> Hi Spark Devs,
>
> Spark 2.1 has been out since end of December
> <http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-1-0-td20390.html>
> and we've got quite a few fixes merged for 2.1.1
> <https://issues.apache.org/jira/browse/SPARK-18281?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
> .
>
> On the Python side one of the things I'd like to see us get out into a
> patch release is a packaging fix (now merged) before we upload to PyPI &
> Conda, and we also have the normal batch of fixes like toLocalIterator for
> large DataFrames in PySpark.
>
> I've chatted with Felix & Shivaram who seem to think the R side is looking
> close to in good shape for a 2.1.1 release to submit to CRAN (if I've
> miss-spoken my apologies). The two outstanding issues that are being
> tracked for R are SPARK-18817, SPARK-19237.
>
> Looking at the other components quickly it seems like structured streaming
> could also benefit from a patch release.
>
> What do others think - are there any issues people are actively targeting
> for 2.1.1? Is this too early to be considering a patch release?
>
> Cheers,
>
> Holden
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>