You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2019/12/23 23:48:28 UTC

Spark 3.0 branch cut and code freeze on Jan 31?

We've pushed out 3.0 multiple times. The latest release window documented on the website ( http://spark.apache.org/versioning-policy.html ) says we'd code freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit from the tragedy of the commons, that nobody is pushing for getting the release out. I understand the natural tendency for each individual is to finish or extend the feature/bug that the person has been working on. At some point we need to say "this is it" and get the release out. I'm happy to help drive this process.

To be realistic, I don't think we should just code freeze * today *. Although we have updated the website, contributors have all been operating under the assumption that all active developments are still going on. I propose we *cut the branch on* *Jan 31* *, and code freeze and switch over to bug squashing mode, and try to get the 3.0 official release out in Q1*. That is, by default no new features can go into the branch starting Jan 31.

What do you think?

And happy holidays everybody.

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Hyukjin Kwon <gu...@gmail.com>.

Sounds fine.
I am trying to get pandas UDF redesign done (SPARK-28264
<https://issues.apache.org/jira/browse/SPARK-28264>) on time. Hope I can
make it.

2019년 12월 24일 (화) 오후 4:17, Wenchen Fan <cl...@gmail.com>님이 작성:

> Sounds good!
>
> On Tue, Dec 24, 2019 at 7:48 AM Reynold Xin <rx...@databricks.com> wrote:
>
>> We've pushed out 3.0 multiple times. The latest release window documented
>> on the website <http://spark.apache.org/versioning-policy.html> says
>> we'd code freeze and cut branch-3.0 early Dec. It looks like we are
>> suffering a bit from the tragedy of the commons, that nobody is pushing for
>> getting the release out. I understand the natural tendency for each
>> individual is to finish or extend the feature/bug that the person has been
>> working on. At some point we need to say "this is it" and get the release
>> out. I'm happy to help drive this process.
>>
>> To be realistic, I don't think we should just code freeze *today*.
>> Although we have updated the website, contributors have all been operating
>> under the assumption that all active developments are still going on. I
>> propose we *cut the branch on **Jan 31**, and code freeze and switch
>> over to bug squashing mode, and try to get the 3.0 official release out in
>> Q1*. That is, by default no new features can go into the branch starting Jan
>> 31.
>>
>> What do you think?
>>
>> And happy holidays everybody.
>>
>>
>>
>>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Wenchen Fan <cl...@gmail.com>.

Sounds good!

On Tue, Dec 24, 2019 at 7:48 AM Reynold Xin <rx...@databricks.com> wrote:

> We've pushed out 3.0 multiple times. The latest release window documented
> on the website <http://spark.apache.org/versioning-policy.html> says we'd
> code freeze and cut branch-3.0 early Dec. It looks like we are suffering a
> bit from the tragedy of the commons, that nobody is pushing for getting the
> release out. I understand the natural tendency for each individual is to
> finish or extend the feature/bug that the person has been working on. At
> some point we need to say "this is it" and get the release out. I'm happy
> to help drive this process.
>
> To be realistic, I don't think we should just code freeze *today*.
> Although we have updated the website, contributors have all been operating
> under the assumption that all active developments are still going on. I
> propose we *cut the branch on **Jan 31**, and code freeze and switch over
> to bug squashing mode, and try to get the 3.0 official release out in Q1*.
> That is, by default no new features can go into the branch starting Jan 31
> .
>
> What do you think?
>
> And happy holidays everybody.
>
>
>
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Hyukjin Kwon <gu...@gmail.com>.

Awesome Shane.

2020년 2월 5일 (수) 오전 7:29, Xiao Li <li...@databricks.com>님이 작성:

> Thank you, Shane!
>
> Xiao
>
> On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you, Shane! :D
>>
>> Bests,
>> Dongjoon
>>
>> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <sk...@berkeley.edu> wrote:
>>
>>> all the 3.0 builds have been created and are currently churning away!
>>>
>>> (the failed builds were to a silly bug in the build scripts sneaking
>>> it's way back in, but that's resolved now)
>>>
>>> shane
>>>
>>> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:
>>>
>>>> Note that branch-3.0 was cut. Please focus on testing, polish, and
>>>> let's get the release out!
>>>>
>>>>
>>>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> Just a reminder - code freeze is coming this Fri!
>>>>>
>>>>> There can always be exceptions, but those should be exceptions and
>>>>> discussed on a case by case basis rather than becoming the norm.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>
>>>>>> Jan 31 sounds good to me.
>>>>>>
>>>>>> Just curious, do we allow some exception on code freeze? One thing
>>>>>> came into my mind is that some feature could have multiple subtasks and
>>>>>> part of subtasks have been merged and other subtask(s) are in reviewing. In
>>>>>> this case do we allow these subtasks to have more days to get reviewed and
>>>>>> merged later?
>>>>>>
>>>>>> Happy Holiday!
>>>>>>
>>>>>> Thanks,
>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <
>>>>>> linguin.m.s@gmail.com> wrote:
>>>>>>
>>>>>>> Looks nice, happy holiday, all!
>>>>>>>
>>>>>>> Bests,
>>>>>>> Takeshi
>>>>>>>
>>>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <
>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1 for January 31st.
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>>>
>>>>>>>>> Xiao
>>>>>>>>>
>>>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>>>> On my end I don't know of anything that's holding up a release;
>>>>>>>>>> is it basically DSv2?
>>>>>>>>>>
>>>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which
>>>>>>>>>> may not have been legitimately tagged. It may be worth reviewing what's
>>>>>>>>>> still open and necessary, and what should be untargeted.
>>>>>>>>>>
>>>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>>>> test
>>>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>>>> after some operations
>>>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>>>> multi-catalog
>>>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty
>>>>>>>>>> local relation table properly
>>>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>>>> smoother upgrade
>>>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>>>> the # of joined tables > 12
>>>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>>>> hadoopConfiguration
>>>>>>>>>> SPARK-24625 put all the backward compatible behavior change
>>>>>>>>>> configs under spark.sql.legacy.*
>>>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch
>>>>>>>>>> failures by default
>>>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>>>> backend cause driver pods to hang
>>>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>>>> containing barrier stage
>>>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>>>> logical Aggregate
>>>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>>>> standard
>>>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source
>>>>>>>>>> to avoid checkpoint corruption
>>>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>>>> produce named output from CleanupAliases
>>>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>>>> window aggregate
>>>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>>>> Kubernetes
>>>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>>>> Mesos
>>>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>>>> barrier execution mode
>>>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>>>> Partition Spec
>>>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>>>>>>>>> Support in Spark
>>>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>>>> nested list of structures
>>>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>>>>>>>>> YARN
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>>>> documented on the website
>>>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>>>>> to help drive this process.
>>>>>>>>>>>
>>>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>>>> branch starting Jan 31.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> And happy holidays everybody.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ---
>>>>>>> Takeshi Yamamuro
>>>>>>>
>>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Xiao Li <li...@databricks.com>.

Thank you, Shane!

Xiao

On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you, Shane! :D
>
> Bests,
> Dongjoon
>
> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <sk...@berkeley.edu> wrote:
>
>> all the 3.0 builds have been created and are currently churning away!
>>
>> (the failed builds were to a silly bug in the build scripts sneaking it's
>> way back in, but that's resolved now)
>>
>> shane
>>
>> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
>>> get the release out!
>>>
>>>
>>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>>
>>>> Just a reminder - code freeze is coming this Fri!
>>>>
>>>> There can always be exceptions, but those should be exceptions and
>>>> discussed on a case by case basis rather than becoming the norm.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>>> kabhwan.opensource@gmail.com> wrote:
>>>>
>>>>> Jan 31 sounds good to me.
>>>>>
>>>>> Just curious, do we allow some exception on code freeze? One thing
>>>>> came into my mind is that some feature could have multiple subtasks and
>>>>> part of subtasks have been merged and other subtask(s) are in reviewing. In
>>>>> this case do we allow these subtasks to have more days to get reviewed and
>>>>> merged later?
>>>>>
>>>>> Happy Holiday!
>>>>>
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>
>>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <
>>>>> linguin.m.s@gmail.com> wrote:
>>>>>
>>>>>> Looks nice, happy holiday, all!
>>>>>>
>>>>>> Bests,
>>>>>> Takeshi
>>>>>>
>>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <
>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>
>>>>>>> +1 for January 31st.
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>>
>>>>>>>> Xiao
>>>>>>>>
>>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>>>> it basically DSv2?
>>>>>>>>>
>>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>>>> not have been legitimately tagged. It may be worth reviewing what's still
>>>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>>>
>>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>>> test
>>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>>> after some operations
>>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>>> multi-catalog
>>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>>>> relation table properly
>>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>>> smoother upgrade
>>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>>> the # of joined tables > 12
>>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>>> hadoopConfiguration
>>>>>>>>> SPARK-24625 put all the backward compatible behavior change
>>>>>>>>> configs under spark.sql.legacy.*
>>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>>>> by default
>>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>>> backend cause driver pods to hang
>>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>>> containing barrier stage
>>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>>> logical Aggregate
>>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>>> standard
>>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>>>> avoid checkpoint corruption
>>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>>> produce named output from CleanupAliases
>>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>>> window aggregate
>>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>>> Kubernetes
>>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>>> Mesos
>>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>>> barrier execution mode
>>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>>> Partition Spec
>>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>>>>>>>> Support in Spark
>>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>>> nested list of structures
>>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>>>>>>>> YARN
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>>> documented on the website
>>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>>>> to help drive this process.
>>>>>>>>>>
>>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>>> branch starting Jan 31.
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> And happy holidays everybody.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> Takeshi Yamamuro
>>>>>>
>>>>>
>>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --
<https://databricks.com/sparkaisummit/north-america>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Dongjoon Hyun <do...@gmail.com>.

Thank you, Shane! :D

Bests,
Dongjoon

On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <sk...@berkeley.edu> wrote:

> all the 3.0 builds have been created and are currently churning away!
>
> (the failed builds were to a silly bug in the build scripts sneaking it's
> way back in, but that's resolved now)
>
> shane
>
> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:
>
>> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
>> get the release out!
>>
>>
>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Just a reminder - code freeze is coming this Fri!
>>>
>>> There can always be exceptions, but those should be exceptions and
>>> discussed on a case by case basis rather than becoming the norm.
>>>
>>>
>>>
>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> Jan 31 sounds good to me.
>>>>
>>>> Just curious, do we allow some exception on code freeze? One thing came
>>>> into my mind is that some feature could have multiple subtasks and part of
>>>> subtasks have been merged and other subtask(s) are in reviewing. In this
>>>> case do we allow these subtasks to have more days to get reviewed and
>>>> merged later?
>>>>
>>>> Happy Holiday!
>>>>
>>>> Thanks,
>>>> Jungtaek Lim (HeartSaVioR)
>>>>
>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <li...@gmail.com>
>>>> wrote:
>>>>
>>>>> Looks nice, happy holiday, all!
>>>>>
>>>>> Bests,
>>>>> Takeshi
>>>>>
>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1 for January 31st.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>>> it basically DSv2?
>>>>>>>>
>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>>> not have been legitimately tagged. It may be worth reviewing what's still
>>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>>
>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>> test
>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>> after some operations
>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>> multi-catalog
>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>>> relation table properly
>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>> smoother upgrade
>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>> the # of joined tables > 12
>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>> hadoopConfiguration
>>>>>>>> SPARK-24625 put all the backward compatible behavior change configs
>>>>>>>> under spark.sql.legacy.*
>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>>> by default
>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>> backend cause driver pods to hang
>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>> containing barrier stage
>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>> logical Aggregate
>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>> standard
>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>>> avoid checkpoint corruption
>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>> produce named output from CleanupAliases
>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>> window aggregate
>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>> Kubernetes
>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>> Mesos
>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>> barrier execution mode
>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>> Partition Spec
>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support
>>>>>>>> in Spark
>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>> nested list of structures
>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>> documented on the website
>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>>> to help drive this process.
>>>>>>>>>
>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>> branch starting Jan 31.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> And happy holidays everybody.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by shane knapp ☠ <sk...@berkeley.edu>.

all the 3.0 builds have been created and are currently churning away!

(the failed builds were to a silly bug in the build scripts sneaking it's
way back in, but that's resolved now)

shane

On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:

> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
> get the release out!
>
>
> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Just a reminder - code freeze is coming this Fri!
>>
>> There can always be exceptions, but those should be exceptions and
>> discussed on a case by case basis rather than becoming the norm.
>>
>>
>>
>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> Jan 31 sounds good to me.
>>>
>>> Just curious, do we allow some exception on code freeze? One thing came
>>> into my mind is that some feature could have multiple subtasks and part of
>>> subtasks have been merged and other subtask(s) are in reviewing. In this
>>> case do we allow these subtasks to have more days to get reviewed and
>>> merged later?
>>>
>>> Happy Holiday!
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <li...@gmail.com>
>>> wrote:
>>>
>>>> Looks nice, happy holiday, all!
>>>>
>>>> Bests,
>>>> Takeshi
>>>>
>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 for January 31st.
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com> wrote:
>>>>>
>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>> it basically DSv2?
>>>>>>>
>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>> not have been legitimately tagged. It may be worth reviewing what's still
>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>
>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>> arbitrary metrics on streaming queries
>>>>>>> SPARK-29348 Add observable metrics
>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>> after some operations
>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>> multi-catalog
>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>> relation table properly
>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>> smoother upgrade
>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the
>>>>>>> # of joined tables > 12
>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>> hadoopConfiguration
>>>>>>> SPARK-24625 put all the backward compatible behavior change configs
>>>>>>> under spark.sql.legacy.*
>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>> by default
>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>> backend cause driver pods to hang
>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>>>>> barrier stage
>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>> logical Aggregate
>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>> standard
>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>> avoid checkpoint corruption
>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>> produce named output from CleanupAliases
>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>> window aggregate
>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>> Kubernetes
>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>>>>> execution mode
>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>> Partition Spec
>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support
>>>>>>> in Spark
>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>>>>>>> list of structures
>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>> documented on the website
>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>> to help drive this process.
>>>>>>>>
>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>> branch starting Jan 31.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> And happy holidays everybody.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Reynold Xin <rx...@databricks.com>.

Note that branch-3.0 was cut. Please focus on testing, polish, and let's get the release out!

On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin < rxin@databricks.com > wrote:

> 
> Just a reminder - code freeze is coming this Fri !
> 
> 
> 
> There can always be exceptions, but those should be exceptions and
> discussed on a case by case basis rather than becoming the norm.
> 
> 
> 
> 
> 
> 
> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim < kabhwan. opensource@ gmail.
> com ( kabhwan.opensource@gmail.com ) > wrote:
> 
>> Jan 31 sounds good to me.
>> 
>> 
>> Just curious, do we allow some exception on code freeze? One thing came
>> into my mind is that some feature could have multiple subtasks and part of
>> subtasks have been merged and other subtask(s) are in reviewing. In this
>> case do we allow these subtasks to have more days to get reviewed and
>> merged later?
>> 
>> 
>> Happy Holiday!
>> 
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>> 
>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro < linguin. m. s@ gmail. com
>> ( linguin.m.s@gmail.com ) > wrote:
>> 
>> 
>>> Looks nice, happy holiday, all!
>>> 
>>> 
>>> Bests,
>>> Takeshi
>>> 
>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun < dongjoon. hyun@ gmail. com
>>> ( dongjoon.hyun@gmail.com ) > wrote:
>>> 
>>> 
>>>> +1 for January 31st.
>>>> 
>>>> 
>>>> Bests,
>>>> Dongjoon.
>>>> 
>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li < lixiao@ databricks. com (
>>>> lixiao@databricks.com ) > wrote:
>>>> 
>>>> 
>>>>> Jan 31 is pretty reasonable. Happy Holidays! 
>>>>> 
>>>>> 
>>>>> Xiao
>>>>> 
>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen < srowen@ gmail. com (
>>>>> srowen@gmail.com ) > wrote:
>>>>> 
>>>>> 
>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary
>>>>>> but indeed this has been in progress for a while, and there's a downside
>>>>>> to not releasing it, to making the gap to 3.0 larger. 
>>>>>> On my end I don't know of anything that's holding up a release; is it
>>>>>> basically DSv2?
>>>>>> 
>>>>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>>>>> have been legitimately tagged. It may be worth reviewing what's still open
>>>>>> and necessary, and what should be untargeted.
>>>>>> 
>>>>>> 
>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>> SPARK-29345 Add an API that allows a user to define and observe arbitrary
>>>>>> metrics on streaming queries
>>>>>> SPARK-29348 Add observable metrics
>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>>>>> some operations
>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>> relation table properly
>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable smoother
>>>>>> upgrade
>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
>>>>>> joined tables > 12
>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>> hadoopConfiguration
>>>>>> SPARK-24625 put all the backward compatible behavior change configs under
>>>>>> spark.sql.legacy.*
>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>>>>> default
>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>>>>> cause driver pods to hang
>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>>>> barrier stage
>>>>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>>>>> Aggregate
>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>>>>>> checkpoint corruption
>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>>>>>> named output from CleanupAliases
>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>>>>> aggregate
>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes
>>>>>> 
>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>> MesosFineGrainedSchedulerBackend
>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>>>> execution mode
>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>>>>> Spec
>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>>>>> Spark
>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
>>>>>> of structures
>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin < rxin@ databricks. com (
>>>>>> rxin@databricks.com ) > wrote:
>>>>>> 
>>>>>> 
>>>>>>> We've pushed out 3.0 multiple times. The latest release window documented
>>>>>>> on the website ( http://spark.apache.org/versioning-policy.html ) says
>>>>>>> we'd code freeze and cut branch-3.0 early Dec. It looks like we are
>>>>>>> suffering a bit from the tragedy of the commons, that nobody is pushing
>>>>>>> for getting the release out. I understand the natural tendency for each
>>>>>>> individual is to finish or extend the feature/bug that the person has been
>>>>>>> working on. At some point we need to say "this is it" and get the release
>>>>>>> out. I'm happy to help drive this process.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> To be realistic, I don't think we should just code freeze * today *.
>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>> propose we *cut the branch on* *Jan 31* *, and code freeze and switch over
>>>>>>> to bug squashing mode, and try to get the 3.0 official release out in Q1*.
>>>>>>> That is, by default no new features can go into the branch starting Jan 31
>>>>>>> .
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> And happy holidays everybody.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Databricks Summit - Watch the talks (
>>>>> https://databricks.com/sparkaisummit/north-america ) 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> ---
>>> Takeshi Yamamuro
>>> 
>> 
>> 
> 
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Reynold Xin <rx...@databricks.com>.

Just a reminder - code freeze is coming this Fri !

There can always be exceptions, but those should be exceptions and discussed on a case by case basis rather than becoming the norm.

On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim < kabhwan.opensource@gmail.com > wrote:

> 
> Jan 31 sounds good to me.
> 
> 
> Just curious, do we allow some exception on code freeze? One thing came
> into my mind is that some feature could have multiple subtasks and part of
> subtasks have been merged and other subtask(s) are in reviewing. In this
> case do we allow these subtasks to have more days to get reviewed and
> merged later?
> 
> 
> Happy Holiday!
> 
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro < linguin. m. s@ gmail. com
> ( linguin.m.s@gmail.com ) > wrote:
> 
> 
>> Looks nice, happy holiday, all!
>> 
>> 
>> Bests,
>> Takeshi
>> 
>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun < dongjoon. hyun@ gmail. com
>> ( dongjoon.hyun@gmail.com ) > wrote:
>> 
>> 
>>> +1 for January 31st.
>>> 
>>> 
>>> Bests,
>>> Dongjoon.
>>> 
>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li < lixiao@ databricks. com (
>>> lixiao@databricks.com ) > wrote:
>>> 
>>> 
>>>> Jan 31 is pretty reasonable. Happy Holidays! 
>>>> 
>>>> 
>>>> Xiao
>>>> 
>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen < srowen@ gmail. com (
>>>> srowen@gmail.com ) > wrote:
>>>> 
>>>> 
>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary
>>>>> but indeed this has been in progress for a while, and there's a downside
>>>>> to not releasing it, to making the gap to 3.0 larger. 
>>>>> On my end I don't know of anything that's holding up a release; is it
>>>>> basically DSv2?
>>>>> 
>>>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>>>> have been legitimately tagged. It may be worth reviewing what's still open
>>>>> and necessary, and what should be untargeted.
>>>>> 
>>>>> 
>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>> SPARK-29345 Add an API that allows a user to define and observe arbitrary
>>>>> metrics on streaming queries
>>>>> SPARK-29348 Add observable metrics
>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>> SPARK-28588 Build a SQL reference doc
>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>> SPARK-28684 Hive module support JDK 11
>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>>>> some operations
>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>> relation table properly
>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>> SPARK-27780 Shuffle server & client should be versioned to enable smoother
>>>>> upgrade
>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
>>>>> joined tables > 12
>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>> SPARK-27520 Introduce a global config system to replace
>>>>> hadoopConfiguration
>>>>> SPARK-24625 put all the backward compatible behavior change configs under
>>>>> spark.sql.legacy.*
>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>>>> default
>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>>>> cause driver pods to hang
>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>>> barrier stage
>>>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>>>> Aggregate
>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>>>>> checkpoint corruption
>>>>> SPARK-25843 Redesign rangeBetween API
>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>>>>> named output from CleanupAliases
>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>>>> aggregate
>>>>> SPARK-25531 new write APIs for data source v2
>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes
>>>>> 
>>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>> MesosFineGrainedSchedulerBackend
>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>>> execution mode
>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>>>> Spec
>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>>>> Spark
>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
>>>>> of structures
>>>>> SPARK-22386 Data Source V2 improvements
>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin < rxin@ databricks. com (
>>>>> rxin@databricks.com ) > wrote:
>>>>> 
>>>>> 
>>>>>> We've pushed out 3.0 multiple times. The latest release window documented
>>>>>> on the website ( http://spark.apache.org/versioning-policy.html ) says
>>>>>> we'd code freeze and cut branch-3.0 early Dec. It looks like we are
>>>>>> suffering a bit from the tragedy of the commons, that nobody is pushing
>>>>>> for getting the release out. I understand the natural tendency for each
>>>>>> individual is to finish or extend the feature/bug that the person has been
>>>>>> working on. At some point we need to say "this is it" and get the release
>>>>>> out. I'm happy to help drive this process.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> To be realistic, I don't think we should just code freeze * today *.
>>>>>> Although we have updated the website, contributors have all been operating
>>>>>> under the assumption that all active developments are still going on. I
>>>>>> propose we *cut the branch on* *Jan 31* *, and code freeze and switch over
>>>>>> to bug squashing mode, and try to get the 3.0 official release out in Q1*.
>>>>>> That is, by default no new features can go into the branch starting Jan 31
>>>>>> .
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> And happy holidays everybody.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Databricks Summit - Watch the talks (
>>>> https://databricks.com/sparkaisummit/north-america ) 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> --
>> ---
>> Takeshi Yamamuro
>> 
> 
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Jungtaek Lim <ka...@gmail.com>.

Jan 31 sounds good to me.

Just curious, do we allow some exception on code freeze? One thing came
into my mind is that some feature could have multiple subtasks and part of
subtasks have been merged and other subtask(s) are in reviewing. In this
case do we allow these subtasks to have more days to get reviewed and
merged later?

Happy Holiday!

Thanks,
Jungtaek Lim (HeartSaVioR)

On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <li...@gmail.com>
wrote:

> Looks nice, happy holiday, all!
>
> Bests,
> Takeshi
>
> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> +1 for January 31st.
>>
>> Bests,
>> Dongjoon.
>>
>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com> wrote:
>>
>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>
>>> Xiao
>>>
>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>> On my end I don't know of anything that's holding up a release; is it
>>>> basically DSv2?
>>>>
>>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>>> have been legitimately tagged. It may be worth reviewing what's still open
>>>> and necessary, and what should be untargeted.
>>>>
>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>> arbitrary metrics on streaming queries
>>>> SPARK-29348 Add observable metrics
>>>> SPARK-29429 Support Prometheus monitoring natively
>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>> SPARK-28588 Build a SQL reference doc
>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>> SPARK-28684 Hive module support JDK 11
>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>>> some operations
>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>> relation table properly
>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>> SPARK-28024 Incorrect numeric values when out of range
>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>> smoother upgrade
>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the #
>>>> of joined tables > 12
>>>> SPARK-27471 Reorganize public v2 catalog API
>>>> SPARK-27520 Introduce a global config system to replace
>>>> hadoopConfiguration
>>>> SPARK-24625 put all the backward compatible behavior change configs
>>>> under spark.sql.legacy.*
>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>> SPARK-25383 Image data source supports sample pushdown
>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>>> default
>>>> SPARK-27296 Efficient User Defined Aggregators
>>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>>> cause driver pods to hang
>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>> barrier stage
>>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>>> Aggregate
>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>> avoid checkpoint corruption
>>>> SPARK-25843 Redesign rangeBetween API
>>>> SPARK-25841 Redesign window function rangeBetween API
>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>> produce named output from CleanupAliases
>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>>> aggregate
>>>> SPARK-25531 new write APIs for data source v2
>>>> SPARK-25547 Pluggable jdbc connection factory
>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>> Kubernetes
>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>> MesosFineGrainedSchedulerBackend
>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>> SPARK-25186 Stabilize Data Source V2 API
>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>> execution mode
>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>>> Spec
>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>>> Spark
>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>>>> list of structures
>>>> SPARK-22386 Data Source V2 improvements
>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>
>>>>
>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>> documented on the website
>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>> release out. I understand the natural tendency for each individual is to
>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>> to help drive this process.
>>>>>
>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>> Although we have updated the website, contributors have all been operating
>>>>> under the assumption that all active developments are still going on. I
>>>>> propose we *cut the branch on **Jan 31**, and code freeze and switch
>>>>> over to bug squashing mode, and try to get the 3.0 official release out in
>>>>> Q1*. That is, by default no new features can go into the branch
>>>>> starting Jan 31.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> And happy holidays everybody.
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> [image: Databricks Summit - Watch the talks]
>>> <https://databricks.com/sparkaisummit/north-america>
>>>
>>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Takeshi Yamamuro <li...@gmail.com>.

Looks nice, happy holiday, all!

Bests,
Takeshi

On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> +1 for January 31st.
>
> Bests,
> Dongjoon.
>
> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com> wrote:
>
>> Jan 31 is pretty reasonable. Happy Holidays!
>>
>> Xiao
>>
>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>> arbitrary but indeed this has been in progress for a while, and there's a
>>> downside to not releasing it, to making the gap to 3.0 larger.
>>> On my end I don't know of anything that's holding up a release; is it
>>> basically DSv2?
>>>
>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>> have been legitimately tagged. It may be worth reviewing what's still open
>>> and necessary, and what should be untargeted.
>>>
>>> SPARK-29768 nondeterministic expression fails column pruning
>>> SPARK-29345 Add an API that allows a user to define and observe
>>> arbitrary metrics on streaming queries
>>> SPARK-29348 Add observable metrics
>>> SPARK-29429 Support Prometheus monitoring natively
>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>> SPARK-28588 Build a SQL reference doc
>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>> SPARK-28684 Hive module support JDK 11
>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>> some operations
>>> SPARK-28264 Revisiting Python / pandas UDF
>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>> SPARK-28155 do not leak SaveMode to file source v2
>>> SPARK-28103 Cannot infer filters from union table with empty local
>>> relation table properly
>>> SPARK-27986 Support Aggregate Expressions with filter
>>> SPARK-28024 Incorrect numeric values when out of range
>>> SPARK-27936 Support local dependency uploading from --py-files
>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>> smoother upgrade
>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the #
>>> of joined tables > 12
>>> SPARK-27471 Reorganize public v2 catalog API
>>> SPARK-27520 Introduce a global config system to replace
>>> hadoopConfiguration
>>> SPARK-24625 put all the backward compatible behavior change configs
>>> under spark.sql.legacy.*
>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>> SPARK-25017 Add test suite for ContextBarrierState
>>> SPARK-25083 remove the type erasure hack in data source scan
>>> SPARK-25383 Image data source supports sample pushdown
>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>> default
>>> SPARK-27296 Efficient User Defined Aggregators
>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>> cause driver pods to hang
>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>> SPARK-21559 Remove Mesos fine-grained mode
>>> SPARK-24942 Improve cluster resource management with jobs containing
>>> barrier stage
>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>> Aggregate
>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>>> checkpoint corruption
>>> SPARK-25843 Redesign rangeBetween API
>>> SPARK-25841 Redesign window function rangeBetween API
>>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>>> named output from CleanupAliases
>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>> aggregate
>>> SPARK-25531 new write APIs for data source v2
>>> SPARK-25547 Pluggable jdbc connection factory
>>> SPARK-20845 Support specification of column names in INSERT INTO
>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>> Kubernetes
>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>> MesosFineGrainedSchedulerBackend
>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>> SPARK-25186 Stabilize Data Source V2 API
>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>> execution mode
>>> SPARK-7768 Make user-defined type (UDT) API public
>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>> Spec
>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>> Spark
>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>>> list of structures
>>> SPARK-22386 Data Source V2 improvements
>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>
>>>
>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com> wrote:
>>>
>>>> We've pushed out 3.0 multiple times. The latest release window
>>>> documented on the website
>>>> <http://spark.apache.org/versioning-policy.html> says we'd code freeze
>>>> and cut branch-3.0 early Dec. It looks like we are suffering a bit from the
>>>> tragedy of the commons, that nobody is pushing for getting the release out.
>>>> I understand the natural tendency for each individual is to finish or
>>>> extend the feature/bug that the person has been working on. At some point
>>>> we need to say "this is it" and get the release out. I'm happy to help
>>>> drive this process.
>>>>
>>>> To be realistic, I don't think we should just code freeze *today*.
>>>> Although we have updated the website, contributors have all been operating
>>>> under the assumption that all active developments are still going on. I
>>>> propose we *cut the branch on **Jan 31**, and code freeze and switch
>>>> over to bug squashing mode, and try to get the 3.0 official release out in
>>>> Q1*. That is, by default no new features can go into the branch
>>>> starting Jan 31.
>>>>
>>>> What do you think?
>>>>
>>>> And happy holidays everybody.
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

-- 
---
Takeshi Yamamuro

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Dongjoon Hyun <do...@gmail.com>.

+1 for January 31st.

Bests,
Dongjoon.

On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com> wrote:

> Jan 31 is pretty reasonable. Happy Holidays!
>
> Xiao
>
> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>
>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>> arbitrary but indeed this has been in progress for a while, and there's a
>> downside to not releasing it, to making the gap to 3.0 larger.
>> On my end I don't know of anything that's holding up a release; is it
>> basically DSv2?
>>
>> BTW these are the items still targeted to 3.0.0, some of which may not
>> have been legitimately tagged. It may be worth reviewing what's still open
>> and necessary, and what should be untargeted.
>>
>> SPARK-29768 nondeterministic expression fails column pruning
>> SPARK-29345 Add an API that allows a user to define and observe arbitrary
>> metrics on streaming queries
>> SPARK-29348 Add observable metrics
>> SPARK-29429 Support Prometheus monitoring natively
>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>> SPARK-28588 Build a SQL reference doc
>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>> SPARK-28684 Hive module support JDK 11
>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>> some operations
>> SPARK-28264 Revisiting Python / pandas UDF
>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>> SPARK-28155 do not leak SaveMode to file source v2
>> SPARK-28103 Cannot infer filters from union table with empty local
>> relation table properly
>> SPARK-27986 Support Aggregate Expressions with filter
>> SPARK-28024 Incorrect numeric values when out of range
>> SPARK-27936 Support local dependency uploading from --py-files
>> SPARK-27780 Shuffle server & client should be versioned to enable
>> smoother upgrade
>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
>> joined tables > 12
>> SPARK-27471 Reorganize public v2 catalog API
>> SPARK-27520 Introduce a global config system to replace
>> hadoopConfiguration
>> SPARK-24625 put all the backward compatible behavior change configs under
>> spark.sql.legacy.*
>> SPARK-24941 Add RDDBarrier.coalesce() function
>> SPARK-25017 Add test suite for ContextBarrierState
>> SPARK-25083 remove the type erasure hack in data source scan
>> SPARK-25383 Image data source supports sample pushdown
>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>> default
>> SPARK-27296 Efficient User Defined Aggregators
>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>> cause driver pods to hang
>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>> SPARK-21559 Remove Mesos fine-grained mode
>> SPARK-24942 Improve cluster resource management with jobs containing
>> barrier stage
>> SPARK-25914 Separate projection from grouping and aggregate in logical
>> Aggregate
>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>> checkpoint corruption
>> SPARK-25843 Redesign rangeBetween API
>> SPARK-25841 Redesign window function rangeBetween API
>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>> named output from CleanupAliases
>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>> aggregate
>> SPARK-25531 new write APIs for data source v2
>> SPARK-25547 Pluggable jdbc connection factory
>> SPARK-20845 Support specification of column names in INSERT INTO
>> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes
>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>> SPARK-25074 Implement maxNumConcurrentTasks() in
>> MesosFineGrainedSchedulerBackend
>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>> SPARK-25186 Stabilize Data Source V2 API
>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>> execution mode
>> SPARK-7768 Make user-defined type (UDT) API public
>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>> Spec
>> SPARK-15694 Implement ScriptTransformation in sql/core
>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>> SPARK-19842 Informational Referential Integrity Constraints Support in
>> Spark
>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
>> of structures
>> SPARK-22386 Data Source V2 improvements
>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>
>>
>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com> wrote:
>>
>>> We've pushed out 3.0 multiple times. The latest release window
>>> documented on the website
>>> <http://spark.apache.org/versioning-policy.html> says we'd code freeze
>>> and cut branch-3.0 early Dec. It looks like we are suffering a bit from the
>>> tragedy of the commons, that nobody is pushing for getting the release out.
>>> I understand the natural tendency for each individual is to finish or
>>> extend the feature/bug that the person has been working on. At some point
>>> we need to say "this is it" and get the release out. I'm happy to help
>>> drive this process.
>>>
>>> To be realistic, I don't think we should just code freeze *today*.
>>> Although we have updated the website, contributors have all been operating
>>> under the assumption that all active developments are still going on. I
>>> propose we *cut the branch on **Jan 31**, and code freeze and switch
>>> over to bug squashing mode, and try to get the 3.0 official release out in
>>> Q1*. That is, by default no new features can go into the branch
>>> starting Jan 31.
>>>
>>> What do you think?
>>>
>>> And happy holidays everybody.
>>>
>>>
>>>
>>>
>
> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Xiao Li <li...@databricks.com>.

Jan 31 is pretty reasonable. Happy Holidays!

Xiao

On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:

> Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary
> but indeed this has been in progress for a while, and there's a downside to
> not releasing it, to making the gap to 3.0 larger.
> On my end I don't know of anything that's holding up a release; is it
> basically DSv2?
>
> BTW these are the items still targeted to 3.0.0, some of which may not
> have been legitimately tagged. It may be worth reviewing what's still open
> and necessary, and what should be untargeted.
>
> SPARK-29768 nondeterministic expression fails column pruning
> SPARK-29345 Add an API that allows a user to define and observe arbitrary
> metrics on streaming queries
> SPARK-29348 Add observable metrics
> SPARK-29429 Support Prometheus monitoring natively
> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
> SPARK-28588 Build a SQL reference doc
> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
> SPARK-28684 Hive module support JDK 11
> SPARK-28548 explain() shows wrong result for persisted DataFrames after
> some operations
> SPARK-28264 Revisiting Python / pandas UDF
> SPARK-28301 fix the behavior of table name resolution with multi-catalog
> SPARK-28155 do not leak SaveMode to file source v2
> SPARK-28103 Cannot infer filters from union table with empty local
> relation table properly
> SPARK-27986 Support Aggregate Expressions with filter
> SPARK-28024 Incorrect numeric values when out of range
> SPARK-27936 Support local dependency uploading from --py-files
> SPARK-27780 Shuffle server & client should be versioned to enable smoother
> upgrade
> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
> joined tables > 12
> SPARK-27471 Reorganize public v2 catalog API
> SPARK-27520 Introduce a global config system to replace hadoopConfiguration
> SPARK-24625 put all the backward compatible behavior change configs under
> spark.sql.legacy.*
> SPARK-24941 Add RDDBarrier.coalesce() function
> SPARK-25017 Add test suite for ContextBarrierState
> SPARK-25083 remove the type erasure hack in data source scan
> SPARK-25383 Image data source supports sample pushdown
> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
> default
> SPARK-27296 Efficient User Defined Aggregators
> SPARK-25128 multiple simultaneous job submissions against k8s backend
> cause driver pods to hang
> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
> SPARK-21559 Remove Mesos fine-grained mode
> SPARK-24942 Improve cluster resource management with jobs containing
> barrier stage
> SPARK-25914 Separate projection from grouping and aggregate in logical
> Aggregate
> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
> SPARK-26221 Improve Spark SQL instrumentation and metrics
> SPARK-26425 Add more constraint checks in file streaming source to avoid
> checkpoint corruption
> SPARK-25843 Redesign rangeBetween API
> SPARK-25841 Redesign window function rangeBetween API
> SPARK-25752 Add trait to easily whitelist logical operators that produce
> named output from CleanupAliases
> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
> aggregate
> SPARK-25531 new write APIs for data source v2
> SPARK-25547 Pluggable jdbc connection factory
> SPARK-20845 Support specification of column names in INSERT INTO
> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes
> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
> SPARK-25074 Implement maxNumConcurrentTasks() in
> MesosFineGrainedSchedulerBackend
> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
> SPARK-25186 Stabilize Data Source V2 API
> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
> execution mode
> SPARK-7768 Make user-defined type (UDT) API public
> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition Spec
> SPARK-15694 Implement ScriptTransformation in sql/core
> SPARK-18134 SQL: MapType in Group BY and Joins not working
> SPARK-19842 Informational Referential Integrity Constraints Support in
> Spark
> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
> of structures
> SPARK-22386 Data Source V2 improvements
> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>
>
> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com> wrote:
>
>> We've pushed out 3.0 multiple times. The latest release window documented
>> on the website <http://spark.apache.org/versioning-policy.html> says
>> we'd code freeze and cut branch-3.0 early Dec. It looks like we are
>> suffering a bit from the tragedy of the commons, that nobody is pushing for
>> getting the release out. I understand the natural tendency for each
>> individual is to finish or extend the feature/bug that the person has been
>> working on. At some point we need to say "this is it" and get the release
>> out. I'm happy to help drive this process.
>>
>> To be realistic, I don't think we should just code freeze *today*.
>> Although we have updated the website, contributors have all been operating
>> under the assumption that all active developments are still going on. I
>> propose we *cut the branch on **Jan 31**, and code freeze and switch
>> over to bug squashing mode, and try to get the 3.0 official release out in
>> Q1*. That is, by default no new features can go into the branch starting Jan
>> 31.
>>
>> What do you think?
>>
>> And happy holidays everybody.
>>
>>
>>
>>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Sean Owen <sr...@gmail.com>.

Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary
but indeed this has been in progress for a while, and there's a downside to
not releasing it, to making the gap to 3.0 larger.
On my end I don't know of anything that's holding up a release; is it
basically DSv2?

BTW these are the items still targeted to 3.0.0, some of which may not have
been legitimately tagged. It may be worth reviewing what's still open and
necessary, and what should be untargeted.

SPARK-29768 nondeterministic expression fails column pruning
SPARK-29345 Add an API that allows a user to define and observe arbitrary
metrics on streaming queries
SPARK-29348 Add observable metrics
SPARK-29429 Support Prometheus monitoring natively
SPARK-29577 Implement p-value simulation and unit tests for chi2 test
SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
SPARK-28588 Build a SQL reference doc
SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
SPARK-28684 Hive module support JDK 11
SPARK-28548 explain() shows wrong result for persisted DataFrames after
some operations
SPARK-28264 Revisiting Python / pandas UDF
SPARK-28301 fix the behavior of table name resolution with multi-catalog
SPARK-28155 do not leak SaveMode to file source v2
SPARK-28103 Cannot infer filters from union table with empty local relation
table properly
SPARK-27986 Support Aggregate Expressions with filter
SPARK-28024 Incorrect numeric values when out of range
SPARK-27936 Support local dependency uploading from --py-files
SPARK-27780 Shuffle server & client should be versioned to enable smoother
upgrade
SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
joined tables > 12
SPARK-27471 Reorganize public v2 catalog API
SPARK-27520 Introduce a global config system to replace hadoopConfiguration
SPARK-24625 put all the backward compatible behavior change configs under
spark.sql.legacy.*
SPARK-24941 Add RDDBarrier.coalesce() function
SPARK-25017 Add test suite for ContextBarrierState
SPARK-25083 remove the type erasure hack in data source scan
SPARK-25383 Image data source supports sample pushdown
SPARK-27272 Enable blacklisting of node/executor on fetch failures by
default
SPARK-27296 Efficient User Defined Aggregators
SPARK-25128 multiple simultaneous job submissions against k8s backend cause
driver pods to hang
SPARK-26664 Make DecimalType's minimum adjusted scale configurable
SPARK-21559 Remove Mesos fine-grained mode
SPARK-24942 Improve cluster resource management with jobs containing
barrier stage
SPARK-25914 Separate projection from grouping and aggregate in logical
Aggregate
SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
SPARK-26221 Improve Spark SQL instrumentation and metrics
SPARK-26425 Add more constraint checks in file streaming source to avoid
checkpoint corruption
SPARK-25843 Redesign rangeBetween API
SPARK-25841 Redesign window function rangeBetween API
SPARK-25752 Add trait to easily whitelist logical operators that produce
named output from CleanupAliases
SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
aggregate
SPARK-25531 new write APIs for data source v2
SPARK-25547 Pluggable jdbc connection factory
SPARK-20845 Support specification of column names in INSERT INTO
SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes
SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
SPARK-25074 Implement maxNumConcurrentTasks() in
MesosFineGrainedSchedulerBackend
SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
SPARK-25186 Stabilize Data Source V2 API
SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
execution mode
SPARK-7768 Make user-defined type (UDT) API public
SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition Spec
SPARK-15694 Implement ScriptTransformation in sql/core
SPARK-18134 SQL: MapType in Group BY and Joins not working
SPARK-19842 Informational Referential Integrity Constraints Support in Spark
SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
of structures
SPARK-22386 Data Source V2 improvements
SPARK-24723 Discuss necessary info and access in barrier mode + YARN


On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com> wrote:

> We've pushed out 3.0 multiple times. The latest release window documented
> on the website <http://spark.apache.org/versioning-policy.html> says we'd
> code freeze and cut branch-3.0 early Dec. It looks like we are suffering a
> bit from the tragedy of the commons, that nobody is pushing for getting the
> release out. I understand the natural tendency for each individual is to
> finish or extend the feature/bug that the person has been working on. At
> some point we need to say "this is it" and get the release out. I'm happy
> to help drive this process.
>
> To be realistic, I don't think we should just code freeze *today*.
> Although we have updated the website, contributors have all been operating
> under the assumption that all active developments are still going on. I
> propose we *cut the branch on **Jan 31**, and code freeze and switch over
> to bug squashing mode, and try to get the 3.0 official release out in Q1*.
> That is, by default no new features can go into the branch starting Jan 31
> .
>
> What do you think?
>
> And happy holidays everybody.
>
>
>
>