You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by 郑瑞峰 <ru...@foxmail.com> on 2020/08/05 02:54:43 UTC

回复： [DISCUSS] Apache Spark 3.0.1 Release

Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.




------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Jason Moore"                                                                                    <Jason.Moore@quantium.com.au.INVALID&gt;;
发送时间:&nbsp;2020年7月30日(星期四) 上午10:35
收件人:&nbsp;"dev"<dev@spark.apache.org&gt;;

主题:&nbsp;Re: [DISCUSS] Apache Spark 3.0.1 Release



  
Hi all,
 
&nbsp;
 
Discussion around 3.0.1 seems to have trickled away.&nbsp; What was blocking the release process kicking off?&nbsp; I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few  critical correctness fixes waiting to be released.
 
&nbsp;
 
Cheers,
 
Jason.
 
&nbsp;
  
From: Takeshi Yamamuro <linguin.m.s@gmail.com&gt;
 Date: Wednesday, 15 July 2020 at 9:00 am
 To: Shivaram Venkataraman <shivaram@eecs.berkeley.edu&gt;
 Cc: "dev@spark.apache.org" <dev@spark.apache.org&gt;
 Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release
 
  
&nbsp;
 
   
&gt; Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
 
 
I don't see any on-going blocker in my area.
  
Thanks for the notification.
 
  
&nbsp;
 
  
Bests,
 
  
Tkaeshi
 
 
 
&nbsp;
   
On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <dongjoon.hyun@gmail.com&gt; wrote:
 
     
Hi, Yi.
  
&nbsp;
 
  
Could you explain why you think that is a blocker? For the given example from the JIRA description,
 
  
&nbsp;
 
 
 
     spark.udf.register("key", udf((m: Map[String, String]) =&gt; m.keys.head.toInt)) 
 
 
    Seq(Map("1" -&gt; "one", "2" -&gt; "two")).toDF("a").createOrReplaceTempView("t") 
 
 
    checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) 
 
 
     
&nbsp;
 
  
Apache Spark 3.0.0 seems to work like the following.
 
  
&nbsp;
 
 
 
      
scala&gt; spark.version
 
 
 
 
     
res0: String = 3.0.0
 
 
 
 
     
&nbsp;
 
 
 
 
     
scala&gt; spark.udf.register("key", udf((m: Map[String, String]) =&gt; m.keys.head.toInt))
 
 
 
 
     
res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string&gt;])),None,false,true)
 
 
 
 
     
&nbsp;
 
 
 
 
     
scala&gt; Seq(Map("1" -&gt; "one", "2" -&gt; "two")).toDF("a").createOrReplaceTempView("t")
 
 
 
 
     
&nbsp;
 
 
 
 
     
scala&gt; sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect
 
 
 
 
     
res3: Array[org.apache.spark.sql.Row] = Array([1])
 
 
 
 
     
&nbsp;
 
  
Could you provide a reproducible example?
 
  
&nbsp;
 
  
Bests,
 
  
Dongjoon.
 
  
&nbsp;
 
 
 
 
 
&nbsp;
   
On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi.wu@databricks.com&gt; wrote:
 
    
This probably be a blocker: https://issues.apache.org/jira/browse/SPARK-32307
 
 
&nbsp;
   
On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <srowen@gmail.com&gt; wrote:
 
  
https://issues.apache.org/jira/browse/SPARK-32234 ?
 
 On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
 <shivaram@eecs.berkeley.edu&gt; wrote:
 &gt;
 &gt; Hi all
 &gt;
 &gt; Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
 &gt;
 &gt; Thanks
 &gt; Shivaram
 &gt;
  
 
  
  
 

 
  
&nbsp;
 
 
-- 
    
---
 Takeshi Yamamuro

Re: [DISCUSS] Apache Spark 3.0.1 Release

Posted by Yi Wu <yi...@databricks.com>.

Hi ruifeng, Thank you for your work. I have a backport PR for 3.0:
https://github.com/apache/spark/pull/29395. It waits for tests now.

Best,
Yi

On Wed, Aug 5, 2020 at 10:57 AM 郑瑞峰 <ru...@foxmail.com> wrote:

> Hi all,
> I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Jason Moore" <Ja...@quantium.com.au.INVALID>;
> *发送时间:* 2020年7月30日(星期四) 上午10:35
> *收件人:* "dev"<de...@spark.apache.org>;
> *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>
> Hi all,
>
>
>
> Discussion around 3.0.1 seems to have trickled away.  What was blocking
> the release process kicking off?  I can see some unresolved bugs raised
> against 3.0.0, but conversely there were quite a few critical correctness
> fixes waiting to be released.
>
>
>
> Cheers,
>
> Jason.
>
>
>
> *From: *Takeshi Yamamuro <li...@gmail.com>
> *Date: *Wednesday, 15 July 2020 at 9:00 am
> *To: *Shivaram Venkataraman <sh...@eecs.berkeley.edu>
> *Cc: *"dev@spark.apache.org" <de...@spark.apache.org>
> *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>
>
> > Just wanted to check if there are any blockers that we are still waiting
> for to start the new release process.
>
> I don't see any on-going blocker in my area.
>
> Thanks for the notification.
>
>
>
> Bests,
>
> Tkaeshi
>
>
>
> On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> Hi, Yi.
>
>
>
> Could you explain why you think that is a blocker? For the given example
> from the JIRA description,
>
>
>
> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
>
> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
>
> checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)
>
>
>
> Apache Spark 3.0.0 seems to work like the following.
>
>
>
> scala> spark.version
>
> res0: String = 3.0.0
>
>
>
> scala> spark.udf.register("key", udf((m: Map[String, String]) =>
> m.keys.head.toInt))
>
> res1: org.apache.spark.sql.expressions.UserDefinedFunction =
> SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:
> map<string,string>])),None,false,true)
>
>
>
> scala> Seq(Map("1" -> "one", "2" ->
> "two")).toDF("a").createOrReplaceTempView("t")
>
>
>
> scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect
>
> res3: Array[org.apache.spark.sql.Row] = Array([1])
>
>
>
> Could you provide a reproducible example?
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi...@databricks.com> wrote:
>
> This probably be a blocker:
> https://issues.apache.org/jira/browse/SPARK-32307
>
>
>
> On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <sr...@gmail.com> wrote:
>
> https://issues.apache.org/jira/browse/SPARK-32234 ?
>
> On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
> >
> > Hi all
> >
> > Just wanted to check if there are any blockers that we are still waiting
> for to start the new release process.
> >
> > Thanks
> > Shivaram
> >
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
>

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Yuming Wang <wg...@gmail.com>.

Another correctness issue: https://issues.apache.org/jira/browse/SPARK-32659

On Tue, Aug 25, 2020 at 11:25 PM Sean Owen <sr...@gmail.com> wrote:

> That isn't a blocker (see comments - not a regression).
> That said I think we have a fix ready to merge now, if there are no
> objections.
>
> On Tue, Aug 25, 2020 at 10:24 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
> >
> > For the correctness blocker, we have the following, Tom.
> >
> > - https://issues.apache.org/jira/browse/SPARK-32614
> > - https://github.com/apache/spark/pull/29516
> >
> > Bests,
> > Dongjoon.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Sean Owen <sr...@gmail.com>.

That isn't a blocker (see comments - not a regression).
That said I think we have a fix ready to merge now, if there are no objections.

On Tue, Aug 25, 2020 at 10:24 AM Dongjoon Hyun <do...@gmail.com> wrote:
>
> For the correctness blocker, we have the following, Tom.
>
> - https://issues.apache.org/jira/browse/SPARK-32614
> - https://github.com/apache/spark/pull/29516
>
> Bests,
> Dongjoon.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

For the correctness blocker, we have the following, Tom.

- https://issues.apache.org/jira/browse/SPARK-32614
- https://github.com/apache/spark/pull/29516

Bests,
Dongjoon.

On Tue, Aug 25, 2020 at 6:32 AM Tom Graves <tg...@yahoo.com.invalid>
wrote:

> Hey,
>
> I'm just curious what the status of the 3.0.1 release is?  Do we have some
> blockers we are waiting on?
>
> Thanks,
> Tom
>
> On Sunday, August 16, 2020, 09:07:44 PM CDT, ruifengz <
> ruifengz@foxmail.com> wrote:
>
>
> Thanks for letting us know this issue.
>
>
> On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:
>
> I've checked the Jenkins log and It seems the commit from
> https://github.com/apache/spark/pull/29404 caused the failure.
>
>
> On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <ko...@tresata.com> wrote:
>
> i noticed commit today that seems to prepare for 3.0.1-rc1:
> commit 05144a5c10cd37ebdbb55fde37d677def49af11f
> Author: Ruifeng Zheng <ru...@apache.org>
> Date:   Sat Aug 15 01:37:47 2020 +0000
>
>     Preparing Spark release v3.0.1-rc1
>
> so i tried to build spark on that commit and i get failure in sql:
>
> 09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in
> stage 77.0 failed 1 times; aborting job
> [info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED ***
> (306 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted due to stage
> failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost
> task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver):
> java.lang.ArithmeticException:
> Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be
> represented as Decimal(38, 18).
> [info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
> Source)
> [info] at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> [info] at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> [info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
> [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
> [info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
> [info] at
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
> [info] at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> [info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
> [info] at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
> [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
> [info] at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
> [info] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info] at java.lang.Thread.run(Thread.java:748)
>
> [error] Failed tests:
> [error] org.apache.spark.sql.DataFrameSuite
>
> On Thu, Aug 13, 2020 at 8:19 PM Jason Moore
> <Ja...@quantium.com.au.invalid>
> <Ja...@quantium.com.au.invalid> wrote:
>
> Thank you so much!  Any update on getting the RC1 up for vote?
>
> Jason.
>
>
> ------------------------------
> *From:* 郑瑞峰 <ru...@foxmail.com>
> *Sent:* Wednesday, 5 August 2020 12:54 PM
> *To:* Jason Moore <Ja...@quantium.com.au.INVALID>
> <Ja...@quantium.com.au.INVALID>; Spark dev list <
> dev@spark.apache.org>
> *Subject:* 回复： [DISCUSS] Apache Spark 3.0.1 Release
>
> Hi all,
> I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Jason Moore" <Ja...@quantium.com.au.INVALID>
> <Ja...@quantium.com.au.INVALID>;
> *发送时间:* 2020年7月30日(星期四) 上午10:35
> *收件人:* "dev"<de...@spark.apache.org>;
> *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>
> Hi all,
>
>
>
> Discussion around 3.0.1 seems to have trickled away.  What was blocking
> the release process kicking off?  I can see some unresolved bugs raised
> against 3.0.0, but conversely there were quite a few critical correctness
> fixes waiting to be released.
>
>
>
> Cheers,
>
> Jason.
>
>
>
> *From: *Takeshi Yamamuro <li...@gmail.com>
> *Date: *Wednesday, 15 July 2020 at 9:00 am
> *To: *Shivaram Venkataraman <sh...@eecs.berkeley.edu>
> *Cc: *"dev@spark.apache.org" <de...@spark.apache.org>
> *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>
>
> > Just wanted to check if there are any blockers that we are still waiting
> for to start the new release process.
>
> I don't see any on-going blocker in my area.
>
> Thanks for the notification.
>
>
>
> Bests,
>
> Tkaeshi
>
>
>
> On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> Hi, Yi.
>
>
>
> Could you explain why you think that is a blocker? For the given example
> from the JIRA description,
>
>
>
> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
>
> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
>
> checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)
>
>
>
> Apache Spark 3.0.0 seems to work like the following.
>
>
>
> scala> spark.version
>
> res0: String = 3.0.0
>
>
>
> scala> spark.udf.register("key", udf((m: Map[String, String]) =>
> m.keys.head.toInt))
>
> res1: org.apache.spark.sql.expressions.UserDefinedFunction =
> SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:
> map<string,string>])),None,false,true)
>
>
>
> scala> Seq(Map("1" -> "one", "2" ->
> "two")).toDF("a").createOrReplaceTempView("t")
>
>
>
> scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect
>
> res3: Array[org.apache.spark.sql.Row] = Array([1])
>
>
>
> Could you provide a reproducible example?
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi...@databricks.com> wrote:
>
> This probably be a blocker:
> https://issues.apache.org/jira/browse/SPARK-32307
>
>
>
> On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <sr...@gmail.com> wrote:
>
> https://issues.apache.org/jira/browse/SPARK-32234 ?
>
> On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
> >
> > Hi all
> >
> > Just wanted to check if there are any blockers that we are still waiting
> for to start the new release process.
> >
> > Thanks
> > Shivaram
> >
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

 Hey,
I'm just curious what the status of the 3.0.1 release is?  Do we have some blockers we are waiting on?
Thanks,Tom
    On Sunday, August 16, 2020, 09:07:44 PM CDT, ruifengz <ru...@foxmail.com> wrote:  

Thanks for letting us know this issue.

 On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:

I've checked the Jenkins log and It seems the commit from https://github.com/apache/spark/pull/29404 caused the failure. 

  On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <ko...@tresata.com> wrote:

  i noticed commit today that seems to prepare for 3.0.1-rc1: commit 05144a5c10cd37ebdbb55fde37d677def49af11f
 Author: Ruifeng Zheng <ru...@apache.org>
 Date:   Sat Aug 15 01:37:47 2020 +0000

     Preparing Spark release v3.0.1-rc1 
  so i tried to build spark on that commit and i get failure in sql: 
  09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 77.0 failed 1 times; aborting job
 [info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (306 milliseconds)
 [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
 [info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
 [info] atorg.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
 [info] atorg.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
 [info] atorg.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
 [info] atorg.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
 [info] atorg.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 [info] atorg.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
 [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
 [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
 [info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
 [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
 [info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
 [info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
 [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 [info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
 [info] atorg.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
 [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
 [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
 [info] atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 [info] atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 [info] at java.lang.Thread.run(Thread.java:748) 
  [error] Failed tests:
 [error] org.apache.spark.sql.DataFrameSuite  
  On Thu, Aug 13, 2020 at 8:19 PM Jason Moore <Ja...@quantium.com.au.invalid> wrote:

  Thank you so much!  Any update on getting the RC1 up for vote? 
  Jason.     

   From: 郑瑞峰 <ru...@foxmail.com>
 Sent: Wednesday, 5 August 2020 12:54 PM
 To: Jason Moore <Ja...@quantium.com.au.INVALID>; Spark dev list <de...@spark.apache.org>
 Subject: 回复： [DISCUSS] Apache Spark 3.0.1 Release     Hi all,  I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen. 

  ------------------ 原始邮件 ------------------  发件人: "Jason Moore" <Ja...@quantium.com.au.INVALID>; 发送时间: 2020年7月30日(星期四) 上午10:35 收件人: "dev"<de...@spark.apache.org>; 主题: Re: [DISCUSS] Apache Spark 3.0.1 Release  

Hi all,

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

Cheers,

Jason.

From:  Takeshi Yamamuro <li...@gmail.com>
 Date: Wednesday, 15 July 2020 at 9:00 am
 To: Shivaram Venkataraman <sh...@eecs.berkeley.edu>
 Cc: "dev@spark.apache.org" <de...@spark.apache.org>
 Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

Bests,

Tkaeshi

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <do...@gmail.com> wrote:

Hi, Yi.

Could you explain why you think that is a blocker? For the given example from the JIRA description,

    spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))       Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")       checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)    

Apache Spark 3.0.0 seems to work like the following.

scala> spark.version

res0: String = 3.0.0

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction =SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:map<string,string>])),None,false,true)

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

Could you provide a reproducible example?

Bests,

Dongjoon.

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi...@databricks.com> wrote:

This probably be a blocker: https://issues.apache.org/jira/browse/SPARK-32307

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <sr...@gmail.com> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

 On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
 <sh...@eecs.berkeley.edu> wrote:
 >
 > Hi all
 >
 > Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
 >
 > Thanks
 > Shivaram
 >

-- 

---
 Takeshi Yamamuro

  -- 
   ---
 Takeshi Yamamuro

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by ruifengz <ru...@foxmail.com>.

Thanks for letting us know this issue.


On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:
> I've checked the Jenkins log and It seems the commit from 
> https://github.com/apache/spark/pull/29404 caused the failure.
>
>
> On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <koert@tresata.com 
> <ma...@tresata.com>> wrote:
>
>     i noticed commit today that seems to prepare for 3.0.1-rc1:
>     commit 05144a5c10cd37ebdbb55fde37d677def49af11f
>     Author: Ruifeng Zheng <ruifengz@apache.org
>     <ma...@apache.org>>
>     Date:   Sat Aug 15 01:37:47 2020 +0000
>
>         Preparing Spark release v3.0.1-rc1
>
>     so i tried to build spark on that commit and i get failure in sql:
>
>     09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task
>     0 in stage 77.0 failed 1 times; aborting job
>     [info] - SPARK-28224: Aggregate sum big decimal overflow ***
>     FAILED *** (306 milliseconds)
>     [info]   org.apache.spark.SparkException: Job aborted due to stage
>     failure: Task 0 in stage 77.0 failed 1 times, most recent failure:
>     Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor
>     driver): java.lang.ArithmeticException:
>     Decimal(expanded,111111111111111111110.246000000000000000,39,18})
>     cannot be represented as Decimal(38, 18).
>     [info] at
>     org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
>     [info] at
>     org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown
>     Source)
>     [info] at
>     org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
>     Source)
>     [info] at
>     org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown
>     Source)
>     [info] at
>     org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>     Source)
>     [info] at
>     org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     [info] at
>     org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
>     [info] at
>     scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>     [info] at
>     scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>     [info] at
>     org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
>     [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
>     [info] at
>     org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
>     [info] at
>     org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
>     [info] at
>     org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     [info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
>     [info] at
>     org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
>     [info] at
>     org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>     [info] at
>     org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
>     [info] at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     [info] at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     [info] at java.lang.Thread.run(Thread.java:748)
>
>     [error] Failed tests:
>     [error] org.apache.spark.sql.DataFrameSuite
>
>     On Thu, Aug 13, 2020 at 8:19 PM Jason Moore
>     <Ja...@quantium.com.au.invalid> wrote:
>
>         Thank you so much!  Any update on getting the RC1 up for vote?
>
>         Jason.
>
>
>         ------------------------------------------------------------------------
>         *From:* 郑瑞峰 <ruifengz@foxmail.com
>         <ma...@foxmail.com>>
>         *Sent:* Wednesday, 5 August 2020 12:54 PM
>         *To:* Jason Moore <Ja...@quantium.com.au.INVALID>; Spark
>         dev list <dev@spark.apache.org <ma...@spark.apache.org>>
>         *Subject:* 回复： [DISCUSS] Apache Spark 3.0.1 Release
>         Hi all,
>         I am going to prepare the realease of 3.0.1 RC1, with the help
>         of Wenchen.
>
>
>         ------------------ 原始邮件 ------------------
>         *发件人:* "Jason Moore" <Ja...@quantium.com.au.INVALID>;
>         *发送时间:* 2020年7月30日(星期四) 上午10:35
>         *收件人:* "dev"<dev@spark.apache.org
>         <ma...@spark.apache.org>>;
>         *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>         Hi all,
>
>         Discussion around 3.0.1 seems to have trickled away.  What was
>         blocking the release process kicking off?  I can see some
>         unresolved bugs raised against 3.0.0, but conversely there
>         were quite a few critical correctness fixes waiting to be
>         released.
>
>         Cheers,
>
>         Jason.
>
>         *From: *Takeshi Yamamuro <linguin.m.s@gmail.com
>         <ma...@gmail.com>>
>         *Date: *Wednesday, 15 July 2020 at 9:00 am
>         *To: *Shivaram Venkataraman <shivaram@eecs.berkeley.edu
>         <ma...@eecs.berkeley.edu>>
>         *Cc: *"dev@spark.apache.org <ma...@spark.apache.org>"
>         <dev@spark.apache.org <ma...@spark.apache.org>>
>         *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>         > Just wanted to check if there are any blockers that we are
>         still waiting for to start the new release process.
>
>         I don't see any on-going blocker in my area.
>
>         Thanks for the notification.
>
>         Bests,
>
>         Tkaeshi
>
>         On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun
>         <dongjoon.hyun@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi, Yi.
>
>             Could you explain why you think that is a blocker? For the
>             given example from the JIRA description,
>
>                 spark.udf.register("key", udf((m: Map[String, String])
>                 => m.keys.head.toInt))
>
>                 Seq(Map("1"-> "one", "2"->
>                 "two")).toDF("a").createOrReplaceTempView("t")
>
>                 checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY
>                 key(a)"), Row(1) :: Nil)
>
>             Apache Spark 3.0.0 seems to work like the following.
>
>                 scala> spark.version
>
>                 res0: String = 3.0.0
>
>                 scala> spark.udf.register("key", udf((m: Map[String,
>                 String]) => m.keys.head.toInt))
>
>                 res1:
>                 org.apache.spark.sql.expressions.UserDefinedFunction =
>                 SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:
>                 map<string,string>])),None,false,true)
>
>                 scala> Seq(Map("1" -> "one", "2" ->
>                 "two")).toDF("a").createOrReplaceTempView("t")
>
>                 scala> sql("SELECT key(a) AS k FROM t GROUP BY
>                 key(a)").collect
>
>                 res3: Array[org.apache.spark.sql.Row] = Array([1])
>
>             Could you provide a reproducible example?
>
>             Bests,
>
>             Dongjoon.
>
>             On Tue, Jul 14, 2020 at 10:04 AM Yi Wu
>             <yi.wu@databricks.com <ma...@databricks.com>> wrote:
>
>                 This probably be a blocker:
>                 https://issues.apache.org/jira/browse/SPARK-32307
>
>                 On Tue, Jul 14, 2020 at 11:13 PM Sean Owen
>                 <srowen@gmail.com <ma...@gmail.com>> wrote:
>
>                     https://issues.apache.org/jira/browse/SPARK-32234 ?
>
>                     On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
>                     <shivaram@eecs.berkeley.edu
>                     <ma...@eecs.berkeley.edu>> wrote:
>                     >
>                     > Hi all
>                     >
>                     > Just wanted to check if there are any blockers
>                     that we are still waiting for to start the new
>                     release process.
>                     >
>                     > Thanks
>                     > Shivaram
>                     >
>
>
>         -- 
>
>         ---
>         Takeshi Yamamuro
>
>
>
> -- 
> ---
> Takeshi Yamamuro

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Takeshi Yamamuro <li...@gmail.com>.

I've checked the Jenkins log and It seems the commit from
https://github.com/apache/spark/pull/29404 caused the failure.


On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <ko...@tresata.com> wrote:

> i noticed commit today that seems to prepare for 3.0.1-rc1:
> commit 05144a5c10cd37ebdbb55fde37d677def49af11f
> Author: Ruifeng Zheng <ru...@apache.org>
> Date:   Sat Aug 15 01:37:47 2020 +0000
>
>     Preparing Spark release v3.0.1-rc1
>
> so i tried to build spark on that commit and i get failure in sql:
>
> 09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in
> stage 77.0 failed 1 times; aborting job
> [info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED ***
> (306 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted due to stage
> failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost
> task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver):
> java.lang.ArithmeticException:
> Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be
> represented as Decimal(38, 18).
> [info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown
> Source)
> [info] at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
> Source)
> [info] at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> [info] at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
> [info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
> [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
> [info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
> [info] at
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
> [info] at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> [info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
> [info] at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
> [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
> [info] at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
> [info] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info] at java.lang.Thread.run(Thread.java:748)
>
> [error] Failed tests:
> [error] org.apache.spark.sql.DataFrameSuite
>
> On Thu, Aug 13, 2020 at 8:19 PM Jason Moore
> <Ja...@quantium.com.au.invalid> wrote:
>
>> Thank you so much!  Any update on getting the RC1 up for vote?
>>
>> Jason.
>>
>>
>> ------------------------------
>> *From:* 郑瑞峰 <ru...@foxmail.com>
>> *Sent:* Wednesday, 5 August 2020 12:54 PM
>> *To:* Jason Moore <Ja...@quantium.com.au.INVALID>; Spark dev list <
>> dev@spark.apache.org>
>> *Subject:* 回复： [DISCUSS] Apache Spark 3.0.1 Release
>>
>> Hi all,
>> I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Jason Moore" <Ja...@quantium.com.au.INVALID>;
>> *发送时间:* 2020年7月30日(星期四) 上午10:35
>> *收件人:* "dev"<de...@spark.apache.org>;
>> *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>> Hi all,
>>
>>
>>
>> Discussion around 3.0.1 seems to have trickled away.  What was blocking
>> the release process kicking off?  I can see some unresolved bugs raised
>> against 3.0.0, but conversely there were quite a few critical correctness
>> fixes waiting to be released.
>>
>>
>>
>> Cheers,
>>
>> Jason.
>>
>>
>>
>> *From: *Takeshi Yamamuro <li...@gmail.com>
>> *Date: *Wednesday, 15 July 2020 at 9:00 am
>> *To: *Shivaram Venkataraman <sh...@eecs.berkeley.edu>
>> *Cc: *"dev@spark.apache.org" <de...@spark.apache.org>
>> *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> > Just wanted to check if there are any blockers that we are still
>> waiting for to start the new release process.
>>
>> I don't see any on-going blocker in my area.
>>
>> Thanks for the notification.
>>
>>
>>
>> Bests,
>>
>> Tkaeshi
>>
>>
>>
>> On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>> Hi, Yi.
>>
>>
>>
>> Could you explain why you think that is a blocker? For the given example
>> from the JIRA description,
>>
>>
>>
>> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
>>
>> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
>>
>> checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)
>>
>>
>>
>> Apache Spark 3.0.0 seems to work like the following.
>>
>>
>>
>> scala> spark.version
>>
>> res0: String = 3.0.0
>>
>>
>>
>> scala> spark.udf.register("key", udf((m: Map[String, String]) =>
>> m.keys.head.toInt))
>>
>> res1: org.apache.spark.sql.expressions.UserDefinedFunction =
>> SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:
>> map<string,string>])),None,false,true)
>>
>>
>>
>> scala> Seq(Map("1" -> "one", "2" ->
>> "two")).toDF("a").createOrReplaceTempView("t")
>>
>>
>>
>> scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect
>>
>> res3: Array[org.apache.spark.sql.Row] = Array([1])
>>
>>
>>
>> Could you provide a reproducible example?
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi...@databricks.com> wrote:
>>
>> This probably be a blocker:
>> https://issues.apache.org/jira/browse/SPARK-32307
>>
>>
>>
>> On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <sr...@gmail.com> wrote:
>>
>> https://issues.apache.org/jira/browse/SPARK-32234 ?
>>
>> On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
>> <sh...@eecs.berkeley.edu> wrote:
>> >
>> > Hi all
>> >
>> > Just wanted to check if there are any blockers that we are still
>> waiting for to start the new release process.
>> >
>> > Thanks
>> > Shivaram
>> >
>>
>>
>>
>>
>> --
>>
>> ---
>> Takeshi Yamamuro
>>
>

-- 
---
Takeshi Yamamuro

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Koert Kuipers <ko...@tresata.com>.

i noticed commit today that seems to prepare for 3.0.1-rc1:
commit 05144a5c10cd37ebdbb55fde37d677def49af11f
Author: Ruifeng Zheng <ru...@apache.org>
Date:   Sat Aug 15 01:37:47 2020 +0000

    Preparing Spark release v3.0.1-rc1

so i tried to build spark on that commit and i get failure in sql:

09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in
stage 77.0 failed 1 times; aborting job
[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED ***
(306 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in
stage 77.0 (TID 197, 192.168.11.17, executor driver):
java.lang.ArithmeticException:
Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be
represented as Decimal(38, 18).
[info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown
Source)
[info] at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
Source)
[info] at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown
Source)
[info] at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
Source)
[info] at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
[info] at
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
[info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[info] at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[info] at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[info] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)

[error] Failed tests:
[error] org.apache.spark.sql.DataFrameSuite

On Thu, Aug 13, 2020 at 8:19 PM Jason Moore
<Ja...@quantium.com.au.invalid> wrote:

> Thank you so much!  Any update on getting the RC1 up for vote?
>
> Jason.
>
>
> ------------------------------
> *From:* 郑瑞峰 <ru...@foxmail.com>
> *Sent:* Wednesday, 5 August 2020 12:54 PM
> *To:* Jason Moore <Ja...@quantium.com.au.INVALID>; Spark dev list <
> dev@spark.apache.org>
> *Subject:* 回复： [DISCUSS] Apache Spark 3.0.1 Release
>
> Hi all,
> I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Jason Moore" <Ja...@quantium.com.au.INVALID>;
> *发送时间:* 2020年7月30日(星期四) 上午10:35
> *收件人:* "dev"<de...@spark.apache.org>;
> *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>
> Hi all,
>
>
>
> Discussion around 3.0.1 seems to have trickled away.  What was blocking
> the release process kicking off?  I can see some unresolved bugs raised
> against 3.0.0, but conversely there were quite a few critical correctness
> fixes waiting to be released.
>
>
>
> Cheers,
>
> Jason.
>
>
>
> *From: *Takeshi Yamamuro <li...@gmail.com>
> *Date: *Wednesday, 15 July 2020 at 9:00 am
> *To: *Shivaram Venkataraman <sh...@eecs.berkeley.edu>
> *Cc: *"dev@spark.apache.org" <de...@spark.apache.org>
> *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>
>
> > Just wanted to check if there are any blockers that we are still waiting
> for to start the new release process.
>
> I don't see any on-going blocker in my area.
>
> Thanks for the notification.
>
>
>
> Bests,
>
> Tkaeshi
>
>
>
> On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> Hi, Yi.
>
>
>
> Could you explain why you think that is a blocker? For the given example
> from the JIRA description,
>
>
>
> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
>
> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
>
> checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)
>
>
>
> Apache Spark 3.0.0 seems to work like the following.
>
>
>
> scala> spark.version
>
> res0: String = 3.0.0
>
>
>
> scala> spark.udf.register("key", udf((m: Map[String, String]) =>
> m.keys.head.toInt))
>
> res1: org.apache.spark.sql.expressions.UserDefinedFunction =
> SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]:
> map<string,string>])),None,false,true)
>
>
>
> scala> Seq(Map("1" -> "one", "2" ->
> "two")).toDF("a").createOrReplaceTempView("t")
>
>
>
> scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect
>
> res3: Array[org.apache.spark.sql.Row] = Array([1])
>
>
>
> Could you provide a reproducible example?
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi...@databricks.com> wrote:
>
> This probably be a blocker:
> https://issues.apache.org/jira/browse/SPARK-32307
>
>
>
> On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <sr...@gmail.com> wrote:
>
> https://issues.apache.org/jira/browse/SPARK-32234 ?
>
> On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
> >
> > Hi all
> >
> > Just wanted to check if there are any blockers that we are still waiting
> for to start the new release process.
> >
> > Thanks
> > Shivaram
> >
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
>

Re: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Posted by Jason Moore <Ja...@quantium.com.au.INVALID>.

Thank you so much!  Any update on getting the RC1 up for vote?

Jason.

________________________________
From: 郑瑞峰 <ru...@foxmail.com>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore <Ja...@quantium.com.au.INVALID>; Spark dev list <de...@spark.apache.org>
Subject: 回复： [DISCUSS] Apache Spark 3.0.1 Release

Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" <Ja...@quantium.com.au.INVALID>;
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<de...@spark.apache.org>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release


Hi all,



Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.



Cheers,

Jason.



From: Takeshi Yamamuro <li...@gmail.com>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <sh...@eecs.berkeley.edu>
Cc: "dev@spark.apache.org" <de...@spark.apache.org>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release



> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.



Bests,

Tkaeshi



On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <do...@gmail.com>> wrote:

Hi, Yi.



Could you explain why you think that is a blocker? For the given example from the JIRA description,



spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)



Apache Spark 3.0.0 seems to work like the following.



scala> spark.version

res0: String = 3.0.0



scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)



scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")



scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])



Could you provide a reproducible example?



Bests,

Dongjoon.





On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <yi...@databricks.com>> wrote:

This probably be a blocker: https://issues.apache.org/jira/browse/SPARK-32307



On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <sr...@gmail.com>> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<sh...@eecs.berkeley.edu>> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>




--

---
Takeshi Yamamuro