You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Xiao Li <ga...@gmail.com> on 2019/12/09 06:32:05 UTC

Spark 3.0 preview release 2?

I got many great feedbacks from the community about the recent 3.0
preview release. Since the last 3.0 preview release, we already have 353
commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
There are various important features and behavior changes we want the
community to try before entering the official release candidates of Spark
3.0.


Below is my selected items that are not part of the last 3.0 preview but
already available in the upstream master branch:


   - Support JDK 11 with Hadoop 2.7
   - Spark SQL will respect its own default format (i.e., parquet) when
   users do CREATE TABLE without USING or STORED AS clauses
   - Enable Parquet nested schema pruning and nested pruning on expressions
   by default
   - Add observable Metrics for Streaming queries
   - Column pruning through nondeterministic expressions
   - RecordBinaryComparator should check endianness when compared by long
   - Improve parallelism for local shuffle reader in adaptive query
   execution
   - Upgrade Apache Arrow to version 0.15.1
   - Various interval-related SQL support
   - Add a mode to pin Python thread into JVM's
   - Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This
can help us find the design/API defects as early as possible and avoid the
significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the
next preview release of Spark 3.0, if we have such a release?


Cheers,


Xiao

Re: Spark 3.0 preview release 2?

Posted by Yuming Wang <wg...@gmail.com>.
I'd like to volunteer as the release manager for the next Spark 3.0 preview.

On Mon, Dec 9, 2019 at 2:34 PM Reynold Xin <rx...@databricks.com> wrote:

> If the cost is low, why don't we just do monthly previews until we code
> freeze? If it is high, maybe we should discuss and do it when there are
> people that volunteer ....
>
>
> On Sun, Dec 08, 2019 at 10:32 PM, Xiao Li <ga...@gmail.com> wrote:
>
>> I got many great feedbacks from the community about the recent 3.0
>> preview release. Since the last 3.0 preview release, we already have 353
>> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
>> There are various important features and behavior changes we want the
>> community to try before entering the official release candidates of Spark
>> 3.0.
>>
>>
>> Below is my selected items that are not part of the last 3.0 preview but
>> already available in the upstream master branch:
>>
>>
>>    - Support JDK 11 with Hadoop 2.7
>>    - Spark SQL will respect its own default format (i.e., parquet) when
>>    users do CREATE TABLE without USING or STORED AS clauses
>>    - Enable Parquet nested schema pruning and nested pruning on
>>    expressions by default
>>    - Add observable Metrics for Streaming queries
>>    - Column pruning through nondeterministic expressions
>>    - RecordBinaryComparator should check endianness when compared by long
>>
>>    - Improve parallelism for local shuffle reader in adaptive query
>>    execution
>>    - Upgrade Apache Arrow to version 0.15.1
>>    - Various interval-related SQL support
>>    - Add a mode to pin Python thread into JVM's
>>    - Provide option to clean up completed files in streaming query
>>
>> I am wondering if we can have another preview release for Spark 3.0? This
>> can help us find the design/API defects as early as possible and avoid the
>> significant delay of the upcoming Spark 3.0 release
>>
>>
>> Also, any committer is willing to volunteer as the release manager of the
>> next preview release of Spark 3.0, if we have such a release?
>>
>>
>> Cheers,
>>
>>
>> Xiao
>>
>
>

Re: Spark 3.0 preview release 2?

Posted by Reynold Xin <rx...@databricks.com>.
If the cost is low, why don't we just do monthly previews until we code freeze? If it is high, maybe we should discuss and do it when there are people that volunteer ....

On Sun, Dec 08, 2019 at 10:32 PM, Xiao Li < gatorsmile@gmail.com > wrote:

> 
> 
> 
> I got many great feedbacks from the community about the recent 3.0 preview
> release. Since the last 3.0 preview release, we already have 353 commits [
> https://github.com/apache/spark/compare/v3.0.0-preview...master (
> https://github.com/apache/spark/compare/v3.0.0-preview...master ) ]. There
> are various important features and behavior changes we want the community
> to try before entering the official release candidates of Spark 3.0. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Below is my selected items that are not part of the last 3.0 preview but
> already available in the upstream master branch: 
> 
> 
> 
> 
> 
> 
> 
> * Support JDK 11 with Hadoop 2.7
> * Spark SQL will respect its own default format (i.e., parquet) when users
> do CREATE TABLE without USING or STORED AS clauses
> * Enable Parquet nested schema pruning and nested pruning on expressions
> by default
> * Add observable Metrics for Streaming queries
> * Column pruning through nondeterministic expressions
> * RecordBinaryComparator should check endianness when compared by long 
> * Improve parallelism for local shuffle reader in adaptive query execution
> 
> * Upgrade Apache Arrow to version 0.15.1
> * Various interval-related SQL support
> * Add a mode to pin Python thread into JVM's
> * Provide option to clean up completed files in streaming query
> 
> 
> 
> 
> 
> 
> 
> 
> I am wondering if we can have another preview release for Spark 3.0? This
> can help us find the design/API defects as early as possible and avoid the
> significant delay of the upcoming Spark 3.0 release
> 
> 
> 
> 
> 
> 
> 
> 
> Also, any committer is willing to volunteer as the release manager of the
> next preview release of Spark 3.0, if we have such a release? 
> 
> 
> 
> 
> 
> 
> 
> 
> Cheers,
> 
> 
> 
> 
> 
> 
> 
> 
> Xiao
> 
> 
>

Re: Spark 3.0 preview release 2?

Posted by Xiao Li <li...@databricks.com>.
Hi, Yuming,

Thank you, @Wang, Yuming <yu...@ebay.com> ! It sounds like everyone is
fine about releasing a new Spark 3.0 preview. Could you start working on
it?

Thanks,

Xiao

On Tue, Dec 10, 2019 at 2:14 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> BTW, our Jenkins seems to be behind.
>
> 1. For the first item, `Support JDK 11 with Hadoop 2.7`:
>     At least, we need a new Jenkins job
> `spark-master-test-maven-hadoop-2.7-jdk-11/`.
> 2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark,
> SparkR on JDK 11 with run-tests)
> 3. https://issues.apache.org/jira/browse/SPARK-29988 (Adjust Jenkins jobs
> for `hive-1.2/2.3` combination)
>
> It would be great if we can finish the above three jobs before mentioning
> them in our release note of the next preview.
>
> Bests,
> Dongjoon.
>
>
> On Tue, Dec 10, 2019 at 6:29 AM Tom Graves <tg...@yahoo.com.invalid>
> wrote:
>
>> +1 for another preview
>>
>> Tom
>>
>> On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <
>> gatorsmile@gmail.com> wrote:
>>
>>
>> I got many great feedbacks from the community about the recent 3.0
>> preview release. Since the last 3.0 preview release, we already have 353
>> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
>> There are various important features and behavior changes we want the
>> community to try before entering the official release candidates of Spark
>> 3.0.
>>
>>
>> Below is my selected items that are not part of the last 3.0 preview but
>> already available in the upstream master branch:
>>
>>
>>    - Support JDK 11 with Hadoop 2.7
>>    - Spark SQL will respect its own default format (i.e., parquet) when
>>    users do CREATE TABLE without USING or STORED AS clauses
>>    - Enable Parquet nested schema pruning and nested pruning on
>>    expressions by default
>>    - Add observable Metrics for Streaming queries
>>    - Column pruning through nondeterministic expressions
>>    - RecordBinaryComparator should check endianness when compared by long
>>
>>    - Improve parallelism for local shuffle reader in adaptive query
>>    execution
>>    - Upgrade Apache Arrow to version 0.15.1
>>    - Various interval-related SQL support
>>    - Add a mode to pin Python thread into JVM's
>>    - Provide option to clean up completed files in streaming query
>>
>> I am wondering if we can have another preview release for Spark 3.0? This
>> can help us find the design/API defects as early as possible and avoid the
>> significant delay of the upcoming Spark 3.0 release
>>
>>
>> Also, any committer is willing to volunteer as the release manager of the
>> next preview release of Spark 3.0, if we have such a release?
>>
>>
>> Cheers,
>>
>>
>> Xiao
>>
>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Re: Spark 3.0 preview release 2?

Posted by Dongjoon Hyun <do...@gmail.com>.
BTW, our Jenkins seems to be behind.

1. For the first item, `Support JDK 11 with Hadoop 2.7`:
    At least, we need a new Jenkins job
`spark-master-test-maven-hadoop-2.7-jdk-11/`.
2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark, SparkR
on JDK 11 with run-tests)
3. https://issues.apache.org/jira/browse/SPARK-29988 (Adjust Jenkins jobs
for `hive-1.2/2.3` combination)

It would be great if we can finish the above three jobs before mentioning
them in our release note of the next preview.

Bests,
Dongjoon.


On Tue, Dec 10, 2019 at 6:29 AM Tom Graves <tg...@yahoo.com.invalid>
wrote:

> +1 for another preview
>
> Tom
>
> On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <
> gatorsmile@gmail.com> wrote:
>
>
> I got many great feedbacks from the community about the recent 3.0
> preview release. Since the last 3.0 preview release, we already have 353
> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
> There are various important features and behavior changes we want the
> community to try before entering the official release candidates of Spark
> 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but
> already available in the upstream master branch:
>
>
>    - Support JDK 11 with Hadoop 2.7
>    - Spark SQL will respect its own default format (i.e., parquet) when
>    users do CREATE TABLE without USING or STORED AS clauses
>    - Enable Parquet nested schema pruning and nested pruning on
>    expressions by default
>    - Add observable Metrics for Streaming queries
>    - Column pruning through nondeterministic expressions
>    - RecordBinaryComparator should check endianness when compared by long
>    - Improve parallelism for local shuffle reader in adaptive query
>    execution
>    - Upgrade Apache Arrow to version 0.15.1
>    - Various interval-related SQL support
>    - Add a mode to pin Python thread into JVM's
>    - Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This
> can help us find the design/API defects as early as possible and avoid the
> significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the
> next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao
>

Re: Spark 3.0 preview release 2?

Posted by Tom Graves <tg...@yahoo.com.INVALID>.
 +1 for another preview
Tom
    On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <ga...@gmail.com> wrote:  
 
 
I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 





Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 


   
   - Support JDK 11 with Hadoop 2.7
   - Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
   - Enable Parquet nested schema pruning and nested pruning on expressions by default
   - Add observable Metrics for Streaming queries
   - Column pruning through nondeterministic expressions
   - RecordBinaryComparator should check endianness when compared by long 
   - Improve parallelism for local shuffle reader in adaptive query execution
   - Upgrade Apache Arrow to version 0.15.1
   - Various interval-related SQL support
   - Add a mode to pin Python thread into JVM's
   - Provide option to clean up completed files in streaming query



I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release




Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 




Cheers,




Xiao
  

Re: Spark 3.0 preview release 2?

Posted by Matei Zaharia <ma...@gmail.com>.
Yup, it would be great to release these more often.

> On Dec 9, 2019, at 4:25 PM, Takeshi Yamamuro <li...@gmail.com> wrote:
> 
> +1; Looks great if we can in terms of user's feedbacks.
> 
> Bests,
> Takeshi
> 
> On Tue, Dec 10, 2019 at 3:14 AM Dongjoon Hyun <dongjoon.hyun@gmail.com <ma...@gmail.com>> wrote:
> Thank you, All.
> 
> +1 for another `3.0-preview`.
> 
> Also, thank you Yuming for volunteering for that!
> 
> Bests,
> Dongjoon.
> 
> 
> On Mon, Dec 9, 2019 at 9:39 AM Xiao Li <lixiao@databricks.com <ma...@databricks.com>> wrote:
> When entering the official release candidates, the new features have to be disabled or even reverted [if the conf is not available] if the fixes are not trivial; otherwise, we might need 10+ RCs to make the final release. The new features should not block the release based on the previous discussions. 
> 
> I agree we should have code freeze at the beginning of 2020. The preview releases should not block the official releases. The preview is just to collect more feedback about these new features or behavior changes.
> 
> Also, for the release of Spark 3.0, we still need the Hive community to do us a favor to release 2.3.7 for having HIVE-22190 <https://issues.apache.org/jira/browse/HIVE-22190>. Before asking Hive community to do 2.3.7 release, if possible, we want our Spark community to have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2, which is based on Hive 2.3 execution JAR. During the preview stage, we might find more issues that are not covered by our test cases.
> 
>  
> 
> On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <srowen@gmail.com <ma...@gmail.com>> wrote:
> Seems fine to me of course. Honestly that wouldn't be a bad result for
> a release candidate, though we would probably roll another one now.
> How about simply moving to a release candidate? If not now then at
> least move to code freeze from the start of 2020. There is also some
> downside in pushing out the 3.0 release further with previews.
> 
> On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <gatorsmile@gmail.com <ma...@gmail.com>> wrote:
> >
> > I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master <https://github.com/apache/spark/compare/v3.0.0-preview...master>]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
> >
> >
> > Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
> >
> > Support JDK 11 with Hadoop 2.7
> > Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> > Enable Parquet nested schema pruning and nested pruning on expressions by default
> > Add observable Metrics for Streaming queries
> > Column pruning through nondeterministic expressions
> > RecordBinaryComparator should check endianness when compared by long
> > Improve parallelism for local shuffle reader in adaptive query execution
> > Upgrade Apache Arrow to version 0.15.1
> > Various interval-related SQL support
> > Add a mode to pin Python thread into JVM's
> > Provide option to clean up completed files in streaming query
> >
> > I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
> >
> >
> > Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
> >
> >
> > Cheers,
> >
> >
> > Xiao
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
> 
> 
> -- 
>  <https://databricks.com/sparkaisummit/north-america> 
> 
> 
> -- 
> ---
> Takeshi Yamamuro


Re: Spark 3.0 preview release 2?

Posted by Takeshi Yamamuro <li...@gmail.com>.
+1; Looks great if we can in terms of user's feedbacks.

Bests,
Takeshi

On Tue, Dec 10, 2019 at 3:14 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you, All.
>
> +1 for another `3.0-preview`.
>
> Also, thank you Yuming for volunteering for that!
>
> Bests,
> Dongjoon.
>
>
> On Mon, Dec 9, 2019 at 9:39 AM Xiao Li <li...@databricks.com> wrote:
>
>> When entering the official release candidates, the new features have to
>> be disabled or even reverted [if the conf is not available] if the fixes
>> are not trivial; otherwise, we might need 10+ RCs to make the final
>> release. The new features should not block the release based on the
>> previous discussions.
>>
>> I agree we should have code freeze at the beginning of 2020. The preview
>> releases should not block the official releases. The preview is just to
>> collect more feedback about these new features or behavior changes.
>>
>> Also, for the release of Spark 3.0, we still need the Hive community to
>> do us a favor to release 2.3.7 for having HIVE-22190
>> <https://issues.apache.org/jira/browse/HIVE-22190>. Before asking Hive
>> community to do 2.3.7 release, if possible, we want our Spark community to
>> have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2,
>> which is based on Hive 2.3 execution JAR. During the preview stage, we
>> might find more issues that are not covered by our test cases.
>>
>>
>>
>> On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Seems fine to me of course. Honestly that wouldn't be a bad result for
>>> a release candidate, though we would probably roll another one now.
>>> How about simply moving to a release candidate? If not now then at
>>> least move to code freeze from the start of 2020. There is also some
>>> downside in pushing out the 3.0 release further with previews.
>>>
>>> On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <ga...@gmail.com> wrote:
>>> >
>>> > I got many great feedbacks from the community about the recent 3.0
>>> preview release. Since the last 3.0 preview release, we already have 353
>>> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
>>> There are various important features and behavior changes we want the
>>> community to try before entering the official release candidates of Spark
>>> 3.0.
>>> >
>>> >
>>> > Below is my selected items that are not part of the last 3.0 preview
>>> but already available in the upstream master branch:
>>> >
>>> > Support JDK 11 with Hadoop 2.7
>>> > Spark SQL will respect its own default format (i.e., parquet) when
>>> users do CREATE TABLE without USING or STORED AS clauses
>>> > Enable Parquet nested schema pruning and nested pruning on expressions
>>> by default
>>> > Add observable Metrics for Streaming queries
>>> > Column pruning through nondeterministic expressions
>>> > RecordBinaryComparator should check endianness when compared by long
>>> > Improve parallelism for local shuffle reader in adaptive query
>>> execution
>>> > Upgrade Apache Arrow to version 0.15.1
>>> > Various interval-related SQL support
>>> > Add a mode to pin Python thread into JVM's
>>> > Provide option to clean up completed files in streaming query
>>> >
>>> > I am wondering if we can have another preview release for Spark 3.0?
>>> This can help us find the design/API defects as early as possible and avoid
>>> the significant delay of the upcoming Spark 3.0 release
>>> >
>>> >
>>> > Also, any committer is willing to volunteer as the release manager of
>>> the next preview release of Spark 3.0, if we have such a release?
>>> >
>>> >
>>> > Cheers,
>>> >
>>> >
>>> > Xiao
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>
>>
>> --
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

-- 
---
Takeshi Yamamuro

Re: Spark 3.0 preview release 2?

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you, All.

+1 for another `3.0-preview`.

Also, thank you Yuming for volunteering for that!

Bests,
Dongjoon.


On Mon, Dec 9, 2019 at 9:39 AM Xiao Li <li...@databricks.com> wrote:

> When entering the official release candidates, the new features have to be
> disabled or even reverted [if the conf is not available] if the fixes are
> not trivial; otherwise, we might need 10+ RCs to make the final release.
> The new features should not block the release based on the previous
> discussions.
>
> I agree we should have code freeze at the beginning of 2020. The preview
> releases should not block the official releases. The preview is just to
> collect more feedback about these new features or behavior changes.
>
> Also, for the release of Spark 3.0, we still need the Hive community to do
> us a favor to release 2.3.7 for having HIVE-22190
> <https://issues.apache.org/jira/browse/HIVE-22190>. Before asking Hive
> community to do 2.3.7 release, if possible, we want our Spark community to
> have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2,
> which is based on Hive 2.3 execution JAR. During the preview stage, we
> might find more issues that are not covered by our test cases.
>
>
>
> On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <sr...@gmail.com> wrote:
>
>> Seems fine to me of course. Honestly that wouldn't be a bad result for
>> a release candidate, though we would probably roll another one now.
>> How about simply moving to a release candidate? If not now then at
>> least move to code freeze from the start of 2020. There is also some
>> downside in pushing out the 3.0 release further with previews.
>>
>> On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > I got many great feedbacks from the community about the recent 3.0
>> preview release. Since the last 3.0 preview release, we already have 353
>> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
>> There are various important features and behavior changes we want the
>> community to try before entering the official release candidates of Spark
>> 3.0.
>> >
>> >
>> > Below is my selected items that are not part of the last 3.0 preview
>> but already available in the upstream master branch:
>> >
>> > Support JDK 11 with Hadoop 2.7
>> > Spark SQL will respect its own default format (i.e., parquet) when
>> users do CREATE TABLE without USING or STORED AS clauses
>> > Enable Parquet nested schema pruning and nested pruning on expressions
>> by default
>> > Add observable Metrics for Streaming queries
>> > Column pruning through nondeterministic expressions
>> > RecordBinaryComparator should check endianness when compared by long
>> > Improve parallelism for local shuffle reader in adaptive query execution
>> > Upgrade Apache Arrow to version 0.15.1
>> > Various interval-related SQL support
>> > Add a mode to pin Python thread into JVM's
>> > Provide option to clean up completed files in streaming query
>> >
>> > I am wondering if we can have another preview release for Spark 3.0?
>> This can help us find the design/API defects as early as possible and avoid
>> the significant delay of the upcoming Spark 3.0 release
>> >
>> >
>> > Also, any committer is willing to volunteer as the release manager of
>> the next preview release of Spark 3.0, if we have such a release?
>> >
>> >
>> > Cheers,
>> >
>> >
>> > Xiao
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>
> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Spark 3.0 preview release 2?

Posted by Xiao Li <li...@databricks.com>.
When entering the official release candidates, the new features have to be
disabled or even reverted [if the conf is not available] if the fixes are
not trivial; otherwise, we might need 10+ RCs to make the final release.
The new features should not block the release based on the previous
discussions.

I agree we should have code freeze at the beginning of 2020. The preview
releases should not block the official releases. The preview is just to
collect more feedback about these new features or behavior changes.

Also, for the release of Spark 3.0, we still need the Hive community to do
us a favor to release 2.3.7 for having HIVE-22190
<https://issues.apache.org/jira/browse/HIVE-22190>. Before asking Hive
community to do 2.3.7 release, if possible, we want our Spark community to
have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2,
which is based on Hive 2.3 execution JAR. During the preview stage, we
might find more issues that are not covered by our test cases.



On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <sr...@gmail.com> wrote:

> Seems fine to me of course. Honestly that wouldn't be a bad result for
> a release candidate, though we would probably roll another one now.
> How about simply moving to a release candidate? If not now then at
> least move to code freeze from the start of 2020. There is also some
> downside in pushing out the 3.0 release further with previews.
>
> On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <ga...@gmail.com> wrote:
> >
> > I got many great feedbacks from the community about the recent 3.0
> preview release. Since the last 3.0 preview release, we already have 353
> commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
> There are various important features and behavior changes we want the
> community to try before entering the official release candidates of Spark
> 3.0.
> >
> >
> > Below is my selected items that are not part of the last 3.0 preview but
> already available in the upstream master branch:
> >
> > Support JDK 11 with Hadoop 2.7
> > Spark SQL will respect its own default format (i.e., parquet) when users
> do CREATE TABLE without USING or STORED AS clauses
> > Enable Parquet nested schema pruning and nested pruning on expressions
> by default
> > Add observable Metrics for Streaming queries
> > Column pruning through nondeterministic expressions
> > RecordBinaryComparator should check endianness when compared by long
> > Improve parallelism for local shuffle reader in adaptive query execution
> > Upgrade Apache Arrow to version 0.15.1
> > Various interval-related SQL support
> > Add a mode to pin Python thread into JVM's
> > Provide option to clean up completed files in streaming query
> >
> > I am wondering if we can have another preview release for Spark 3.0?
> This can help us find the design/API defects as early as possible and avoid
> the significant delay of the upcoming Spark 3.0 release
> >
> >
> > Also, any committer is willing to volunteer as the release manager of
> the next preview release of Spark 3.0, if we have such a release?
> >
> >
> > Cheers,
> >
> >
> > Xiao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Re: Spark 3.0 preview release 2?

Posted by Sean Owen <sr...@gmail.com>.
Seems fine to me of course. Honestly that wouldn't be a bad result for
a release candidate, though we would probably roll another one now.
How about simply moving to a release candidate? If not now then at
least move to code freeze from the start of 2020. There is also some
downside in pushing out the 3.0 release further with previews.

On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <ga...@gmail.com> wrote:
>
> I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
>
> Support JDK 11 with Hadoop 2.7
> Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> Enable Parquet nested schema pruning and nested pruning on expressions by default
> Add observable Metrics for Streaming queries
> Column pruning through nondeterministic expressions
> RecordBinaryComparator should check endianness when compared by long
> Improve parallelism for local shuffle reader in adaptive query execution
> Upgrade Apache Arrow to version 0.15.1
> Various interval-related SQL support
> Add a mode to pin Python thread into JVM's
> Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org