You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2020/02/02 02:15:51 UTC

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Note that branch-3.0 was cut. Please focus on testing, polish, and let's get the release out!

On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin < rxin@databricks.com > wrote:

> 
> Just a reminder - code freeze is coming this Fri !
> 
> 
> 
> There can always be exceptions, but those should be exceptions and
> discussed on a case by case basis rather than becoming the norm.
> 
> 
> 
> 
> 
> 
> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim < kabhwan. opensource@ gmail.
> com ( kabhwan.opensource@gmail.com ) > wrote:
> 
>> Jan 31 sounds good to me.
>> 
>> 
>> Just curious, do we allow some exception on code freeze? One thing came
>> into my mind is that some feature could have multiple subtasks and part of
>> subtasks have been merged and other subtask(s) are in reviewing. In this
>> case do we allow these subtasks to have more days to get reviewed and
>> merged later?
>> 
>> 
>> Happy Holiday!
>> 
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>> 
>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro < linguin. m. s@ gmail. com
>> ( linguin.m.s@gmail.com ) > wrote:
>> 
>> 
>>> Looks nice, happy holiday, all!
>>> 
>>> 
>>> Bests,
>>> Takeshi
>>> 
>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun < dongjoon. hyun@ gmail. com
>>> ( dongjoon.hyun@gmail.com ) > wrote:
>>> 
>>> 
>>>> +1 for January 31st.
>>>> 
>>>> 
>>>> Bests,
>>>> Dongjoon.
>>>> 
>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li < lixiao@ databricks. com (
>>>> lixiao@databricks.com ) > wrote:
>>>> 
>>>> 
>>>>> Jan 31 is pretty reasonable. Happy Holidays! 
>>>>> 
>>>>> 
>>>>> Xiao
>>>>> 
>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen < srowen@ gmail. com (
>>>>> srowen@gmail.com ) > wrote:
>>>>> 
>>>>> 
>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary
>>>>>> but indeed this has been in progress for a while, and there's a downside
>>>>>> to not releasing it, to making the gap to 3.0 larger. 
>>>>>> On my end I don't know of anything that's holding up a release; is it
>>>>>> basically DSv2?
>>>>>> 
>>>>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>>>>> have been legitimately tagged. It may be worth reviewing what's still open
>>>>>> and necessary, and what should be untargeted.
>>>>>> 
>>>>>> 
>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>> SPARK-29345 Add an API that allows a user to define and observe arbitrary
>>>>>> metrics on streaming queries
>>>>>> SPARK-29348 Add observable metrics
>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>>>>> some operations
>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>> relation table properly
>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable smoother
>>>>>> upgrade
>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
>>>>>> joined tables > 12
>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>> hadoopConfiguration
>>>>>> SPARK-24625 put all the backward compatible behavior change configs under
>>>>>> spark.sql.legacy.*
>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>>>>> default
>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>>>>> cause driver pods to hang
>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>>>> barrier stage
>>>>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>>>>> Aggregate
>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>>>>>> checkpoint corruption
>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>>>>>> named output from CleanupAliases
>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>>>>> aggregate
>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode + Kubernetes
>>>>>> 
>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>> MesosFineGrainedSchedulerBackend
>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>>>> execution mode
>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>>>>> Spec
>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>>>>> Spark
>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
>>>>>> of structures
>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin < rxin@ databricks. com (
>>>>>> rxin@databricks.com ) > wrote:
>>>>>> 
>>>>>> 
>>>>>>> We've pushed out 3.0 multiple times. The latest release window documented
>>>>>>> on the website ( http://spark.apache.org/versioning-policy.html ) says
>>>>>>> we'd code freeze and cut branch-3.0 early Dec. It looks like we are
>>>>>>> suffering a bit from the tragedy of the commons, that nobody is pushing
>>>>>>> for getting the release out. I understand the natural tendency for each
>>>>>>> individual is to finish or extend the feature/bug that the person has been
>>>>>>> working on. At some point we need to say "this is it" and get the release
>>>>>>> out. I'm happy to help drive this process.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> To be realistic, I don't think we should just code freeze * today *.
>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>> propose we *cut the branch on* *Jan 31* *, and code freeze and switch over
>>>>>>> to bug squashing mode, and try to get the 3.0 official release out in Q1*.
>>>>>>> That is, by default no new features can go into the branch starting Jan 31
>>>>>>> .
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> And happy holidays everybody.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Databricks Summit - Watch the talks (
>>>>> https://databricks.com/sparkaisummit/north-america ) 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> ---
>>> Takeshi Yamamuro
>>> 
>> 
>> 
> 
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Hyukjin Kwon <gu...@gmail.com>.

Awesome Shane.

2020년 2월 5일 (수) 오전 7:29, Xiao Li <li...@databricks.com>님이 작성:

> Thank you, Shane!
>
> Xiao
>
> On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you, Shane! :D
>>
>> Bests,
>> Dongjoon
>>
>> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <sk...@berkeley.edu> wrote:
>>
>>> all the 3.0 builds have been created and are currently churning away!
>>>
>>> (the failed builds were to a silly bug in the build scripts sneaking
>>> it's way back in, but that's resolved now)
>>>
>>> shane
>>>
>>> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:
>>>
>>>> Note that branch-3.0 was cut. Please focus on testing, polish, and
>>>> let's get the release out!
>>>>
>>>>
>>>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> Just a reminder - code freeze is coming this Fri!
>>>>>
>>>>> There can always be exceptions, but those should be exceptions and
>>>>> discussed on a case by case basis rather than becoming the norm.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>
>>>>>> Jan 31 sounds good to me.
>>>>>>
>>>>>> Just curious, do we allow some exception on code freeze? One thing
>>>>>> came into my mind is that some feature could have multiple subtasks and
>>>>>> part of subtasks have been merged and other subtask(s) are in reviewing. In
>>>>>> this case do we allow these subtasks to have more days to get reviewed and
>>>>>> merged later?
>>>>>>
>>>>>> Happy Holiday!
>>>>>>
>>>>>> Thanks,
>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <
>>>>>> linguin.m.s@gmail.com> wrote:
>>>>>>
>>>>>>> Looks nice, happy holiday, all!
>>>>>>>
>>>>>>> Bests,
>>>>>>> Takeshi
>>>>>>>
>>>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <
>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1 for January 31st.
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>>>
>>>>>>>>> Xiao
>>>>>>>>>
>>>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>>>> On my end I don't know of anything that's holding up a release;
>>>>>>>>>> is it basically DSv2?
>>>>>>>>>>
>>>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which
>>>>>>>>>> may not have been legitimately tagged. It may be worth reviewing what's
>>>>>>>>>> still open and necessary, and what should be untargeted.
>>>>>>>>>>
>>>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>>>> test
>>>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>>>> after some operations
>>>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>>>> multi-catalog
>>>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty
>>>>>>>>>> local relation table properly
>>>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>>>> smoother upgrade
>>>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>>>> the # of joined tables > 12
>>>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>>>> hadoopConfiguration
>>>>>>>>>> SPARK-24625 put all the backward compatible behavior change
>>>>>>>>>> configs under spark.sql.legacy.*
>>>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch
>>>>>>>>>> failures by default
>>>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>>>> backend cause driver pods to hang
>>>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>>>> containing barrier stage
>>>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>>>> logical Aggregate
>>>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>>>> standard
>>>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source
>>>>>>>>>> to avoid checkpoint corruption
>>>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>>>> produce named output from CleanupAliases
>>>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>>>> window aggregate
>>>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>>>> Kubernetes
>>>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>>>> Mesos
>>>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>>>> barrier execution mode
>>>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>>>> Partition Spec
>>>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>>>>>>>>> Support in Spark
>>>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>>>> nested list of structures
>>>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>>>>>>>>> YARN
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>>>> documented on the website
>>>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>>>>> to help drive this process.
>>>>>>>>>>>
>>>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>>>> branch starting Jan 31.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> And happy holidays everybody.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ---
>>>>>>> Takeshi Yamamuro
>>>>>>>
>>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Xiao Li <li...@databricks.com>.

Thank you, Shane!

Xiao

On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you, Shane! :D
>
> Bests,
> Dongjoon
>
> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <sk...@berkeley.edu> wrote:
>
>> all the 3.0 builds have been created and are currently churning away!
>>
>> (the failed builds were to a silly bug in the build scripts sneaking it's
>> way back in, but that's resolved now)
>>
>> shane
>>
>> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
>>> get the release out!
>>>
>>>
>>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>>
>>>> Just a reminder - code freeze is coming this Fri!
>>>>
>>>> There can always be exceptions, but those should be exceptions and
>>>> discussed on a case by case basis rather than becoming the norm.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>>> kabhwan.opensource@gmail.com> wrote:
>>>>
>>>>> Jan 31 sounds good to me.
>>>>>
>>>>> Just curious, do we allow some exception on code freeze? One thing
>>>>> came into my mind is that some feature could have multiple subtasks and
>>>>> part of subtasks have been merged and other subtask(s) are in reviewing. In
>>>>> this case do we allow these subtasks to have more days to get reviewed and
>>>>> merged later?
>>>>>
>>>>> Happy Holiday!
>>>>>
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>
>>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <
>>>>> linguin.m.s@gmail.com> wrote:
>>>>>
>>>>>> Looks nice, happy holiday, all!
>>>>>>
>>>>>> Bests,
>>>>>> Takeshi
>>>>>>
>>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <
>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>
>>>>>>> +1 for January 31st.
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>>
>>>>>>>> Xiao
>>>>>>>>
>>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>>>> it basically DSv2?
>>>>>>>>>
>>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>>>> not have been legitimately tagged. It may be worth reviewing what's still
>>>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>>>
>>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>>> test
>>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>>> after some operations
>>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>>> multi-catalog
>>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>>>> relation table properly
>>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>>> smoother upgrade
>>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>>> the # of joined tables > 12
>>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>>> hadoopConfiguration
>>>>>>>>> SPARK-24625 put all the backward compatible behavior change
>>>>>>>>> configs under spark.sql.legacy.*
>>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>>>> by default
>>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>>> backend cause driver pods to hang
>>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>>> containing barrier stage
>>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>>> logical Aggregate
>>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>>> standard
>>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>>>> avoid checkpoint corruption
>>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>>> produce named output from CleanupAliases
>>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>>> window aggregate
>>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>>> Kubernetes
>>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>>> Mesos
>>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>>> barrier execution mode
>>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>>> Partition Spec
>>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>>>>>>>> Support in Spark
>>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>>> nested list of structures
>>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>>>>>>>> YARN
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>>> documented on the website
>>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>>>> to help drive this process.
>>>>>>>>>>
>>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>>> branch starting Jan 31.
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> And happy holidays everybody.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> Takeshi Yamamuro
>>>>>>
>>>>>
>>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --
<https://databricks.com/sparkaisummit/north-america>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by Dongjoon Hyun <do...@gmail.com>.

Thank you, Shane! :D

Bests,
Dongjoon

On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <sk...@berkeley.edu> wrote:

> all the 3.0 builds have been created and are currently churning away!
>
> (the failed builds were to a silly bug in the build scripts sneaking it's
> way back in, but that's resolved now)
>
> shane
>
> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:
>
>> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
>> get the release out!
>>
>>
>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Just a reminder - code freeze is coming this Fri!
>>>
>>> There can always be exceptions, but those should be exceptions and
>>> discussed on a case by case basis rather than becoming the norm.
>>>
>>>
>>>
>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> Jan 31 sounds good to me.
>>>>
>>>> Just curious, do we allow some exception on code freeze? One thing came
>>>> into my mind is that some feature could have multiple subtasks and part of
>>>> subtasks have been merged and other subtask(s) are in reviewing. In this
>>>> case do we allow these subtasks to have more days to get reviewed and
>>>> merged later?
>>>>
>>>> Happy Holiday!
>>>>
>>>> Thanks,
>>>> Jungtaek Lim (HeartSaVioR)
>>>>
>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <li...@gmail.com>
>>>> wrote:
>>>>
>>>>> Looks nice, happy holiday, all!
>>>>>
>>>>> Bests,
>>>>> Takeshi
>>>>>
>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1 for January 31st.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>>> it basically DSv2?
>>>>>>>>
>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>>> not have been legitimately tagged. It may be worth reviewing what's still
>>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>>
>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>> test
>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>> after some operations
>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>> multi-catalog
>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>>> relation table properly
>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>> smoother upgrade
>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>> the # of joined tables > 12
>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>> hadoopConfiguration
>>>>>>>> SPARK-24625 put all the backward compatible behavior change configs
>>>>>>>> under spark.sql.legacy.*
>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>>> by default
>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>> backend cause driver pods to hang
>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>> containing barrier stage
>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>> logical Aggregate
>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>> standard
>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>>> avoid checkpoint corruption
>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>> produce named output from CleanupAliases
>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>> window aggregate
>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>> Kubernetes
>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>> Mesos
>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>> barrier execution mode
>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>> Partition Spec
>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support
>>>>>>>> in Spark
>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>> nested list of structures
>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>> documented on the website
>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>>> to help drive this process.
>>>>>>>>>
>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>> branch starting Jan 31.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> And happy holidays everybody.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Posted by shane knapp ☠ <sk...@berkeley.edu>.

all the 3.0 builds have been created and are currently churning away!

(the failed builds were to a silly bug in the build scripts sneaking it's
way back in, but that's resolved now)

shane

On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <rx...@databricks.com> wrote:

> Note that branch-3.0 was cut. Please focus on testing, polish, and let's
> get the release out!
>
>
> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Just a reminder - code freeze is coming this Fri!
>>
>> There can always be exceptions, but those should be exceptions and
>> discussed on a case by case basis rather than becoming the norm.
>>
>>
>>
>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> Jan 31 sounds good to me.
>>>
>>> Just curious, do we allow some exception on code freeze? One thing came
>>> into my mind is that some feature could have multiple subtasks and part of
>>> subtasks have been merged and other subtask(s) are in reviewing. In this
>>> case do we allow these subtasks to have more days to get reviewed and
>>> merged later?
>>>
>>> Happy Holiday!
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <li...@gmail.com>
>>> wrote:
>>>
>>>> Looks nice, happy holiday, all!
>>>>
>>>> Bests,
>>>> Takeshi
>>>>
>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 for January 31st.
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <li...@databricks.com> wrote:
>>>>>
>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>> arbitrary but indeed this has been in progress for a while, and there's a
>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>> On my end I don't know of anything that's holding up a release; is
>>>>>>> it basically DSv2?
>>>>>>>
>>>>>>> BTW these are the items still targeted to 3.0.0, some of which may
>>>>>>> not have been legitimately tagged. It may be worth reviewing what's still
>>>>>>> open and necessary, and what should be untargeted.
>>>>>>>
>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>> arbitrary metrics on streaming queries
>>>>>>> SPARK-29348 Add observable metrics
>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>> after some operations
>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>> multi-catalog
>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>>> relation table properly
>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>> smoother upgrade
>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the
>>>>>>> # of joined tables > 12
>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>> hadoopConfiguration
>>>>>>> SPARK-24625 put all the backward compatible behavior change configs
>>>>>>> under spark.sql.legacy.*
>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures
>>>>>>> by default
>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>> backend cause driver pods to hang
>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>>>>> barrier stage
>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>> logical Aggregate
>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>> standard
>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>> SPARK-26425 Add more constraint checks in file streaming source to
>>>>>>> avoid checkpoint corruption
>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>> produce named output from CleanupAliases
>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>> window aggregate
>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>> Kubernetes
>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>>>>> execution mode
>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>> Partition Spec
>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support
>>>>>>> in Spark
>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>>>>>>> list of structures
>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>> documented on the website
>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering a bit
>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting the
>>>>>>>> release out. I understand the natural tendency for each individual is to
>>>>>>>> finish or extend the feature/bug that the person has been working on. At
>>>>>>>> some point we need to say "this is it" and get the release out. I'm happy
>>>>>>>> to help drive this process.
>>>>>>>>
>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>> Although we have updated the website, contributors have all been operating
>>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official release
>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>> branch starting Jan 31.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> And happy holidays everybody.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu