You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/17 15:49:00 UTC

[GitHub] [spark] wangyum opened a new pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

wangyum opened a new pull request #29790:
URL: https://github.com/apache/spark/pull/29790


   ### What changes were proposed in this pull request?
   
   Some expression's data type not a static value. It needs to be calculated every time. e.g. `CaseWhen`.
   We should avoid calling dataType multiple times for each expression.
   
   
   ### Why are the changes needed?
   
   Improve query performance. for example:
   ```scala
   spark.range(100000000L).selectExpr("approx_count_distinct(case when id % 400 > 20 then id else 0 end)").show
   ```
   
   Profiling result:
   ```
   -- Execution profile ---
   Total samples       : 18365
   
   Frame buffer usage  : 2.6688%
   
   --- 58443254327 ns (31.82%), 5844 samples
     [ 0] GenericTaskQueueSet<OverflowTaskQueue<StarTask, (MemoryType)1, 131072u>, (MemoryType)1>::steal_best_of_2(unsigned int, int*, StarTask&)
     [ 1] StealTask::do_it(GCTaskManager*, unsigned int)
     [ 2] GCTaskThread::run()
     [ 3] java_start(Thread*)
     [ 4] start_thread
   
   --- 6140668667 ns (3.34%), 614 samples
     [ 0] GenericTaskQueueSet<OverflowTaskQueue<StarTask, (MemoryType)1, 131072u>, (MemoryType)1>::peek()
     [ 1] ParallelTaskTerminator::offer_termination(TerminatorTerminator*)
     [ 2] StealTask::do_it(GCTaskManager*, unsigned int)
     [ 3] GCTaskThread::run()
     [ 4] java_start(Thread*)
     [ 5] start_thread
   
   --- 5679994036 ns (3.09%), 568 samples
     [ 0] scala.collection.generic.Growable.$plus$plus$eq
     [ 1] scala.collection.generic.Growable.$plus$plus$eq$
     [ 2] scala.collection.mutable.ListBuffer.$plus$plus$eq
     [ 3] scala.collection.mutable.ListBuffer.$plus$plus$eq
     [ 4] scala.collection.generic.GenericTraversableTemplate.$anonfun$flatten$1
     [ 5] scala.collection.generic.GenericTraversableTemplate$$Lambda$107.411506101.apply
     [ 6] scala.collection.immutable.List.foreach
     [ 7] scala.collection.generic.GenericTraversableTemplate.flatten
     [ 8] scala.collection.generic.GenericTraversableTemplate.flatten$
     [ 9] scala.collection.AbstractTraversable.flatten
     [10] org.apache.spark.internal.config.ConfigEntry.readString
     [11] org.apache.spark.internal.config.ConfigEntryWithDefault.readFrom
     [12] org.apache.spark.sql.internal.SQLConf.getConf
     [13] org.apache.spark.sql.internal.SQLConf.caseSensitiveAnalysis
     [14] org.apache.spark.sql.types.DataType.sameType
     [15] org.apache.spark.sql.catalyst.analysis.TypeCoercion$.$anonfun$haveSameType$1
     [16] org.apache.spark.sql.catalyst.analysis.TypeCoercion$.$anonfun$haveSameType$1$adapted
     [17] org.apache.spark.sql.catalyst.analysis.TypeCoercion$$$Lambda$1527.1975399904.apply
     [18] scala.collection.IndexedSeqOptimized.prefixLengthImpl
     [19] scala.collection.IndexedSeqOptimized.forall
     [20] scala.collection.IndexedSeqOptimized.forall$
     [21] scala.collection.mutable.ArrayBuffer.forall
     [22] org.apache.spark.sql.catalyst.analysis.TypeCoercion$.haveSameType
     [23] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck
     [24] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck$
     [25] org.apache.spark.sql.catalyst.expressions.CaseWhen.dataTypeCheck
     [26] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType
     [27] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType$
     [28] org.apache.spark.sql.catalyst.expressions.CaseWhen.dataType
     [29] org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus.update
     [30] org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1.$anonfun$applyOrElse$2
     [31] org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1.$anonfun$applyOrElse$2$adapted
     [32] org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$Lambda$1534.1383512673.apply
     [33] org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateProcessRow$7
     [34] org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateProcessRow$7$adapted
     [35] org.apache.spark.sql.execution.aggregate.AggregationIterator$$Lambda$1555.725788712.apply
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Manual test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696802100


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128983/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696668112


   **[Test build #128970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128970/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r496663486



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -1048,6 +1048,11 @@ trait ComplexTypeMergingExpression extends Expression {
   @transient
   lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
 
+  private lazy val internalDataType: DataType = {

Review comment:
       can we put it right before the line of `override def dataType: DataType`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696761600


   **[Test build #128983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128983/testReport)** for PR 29790 at commit [`649d3c2`](https://github.com/apache/spark/commit/649d3c2bee57ed851a031ee01ceba0fe75dd6ef7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696762294






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494810795



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
   since = "2.4.0")
 case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
-  override def dataType: DataType = {
-    dataTypeCheck

Review comment:
       Do you mean add it back?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696960524


   **[Test build #128985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128985/testReport)** for PR 29790 at commit [`f2dc664`](https://github.com/apache/spark/commit/f2dc664df7ccde629d9df8dcad46cf1657a52ff9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696547533


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128957/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696780076






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699894033






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599


   **[Test build #128957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494807719



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
   since = "2.4.0")
 case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
-  override def dataType: DataType = {
-    dataTypeCheck

Review comment:
       After checking the code, seems `dateTypeCheck` is not put in `checkInputDataTypes`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492551773



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
     children: Seq[Expression] = Seq.empty,
     exprId: ExprId = NamedExpression.newExprId)
   extends SubqueryExpression(plan, children, exprId) with Unevaluable {
-  override def dataType: DataType = {
+
+  private lazy val internalDataType: DataType = {

Review comment:
       does this need to be a lazy val? seems a very cheap method.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
 case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {
     dataTypeCheck

Review comment:
       shall we just remove this line? input data type check is part of the resolution procedure, and we don't need to do it again when accessing the data type.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,16 @@ object ArrayUnion {
   since = "2.4.0")
 case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
-  override def dataType: DataType = {
+
+  private lazy val internalDataType: DataType = {
     dataTypeCheck

Review comment:
       ditto

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
 
   @transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {

Review comment:
       is it expensive? it just creates a few objects.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694524598


   **[Test build #128836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128836/testReport)** for PR 29790 at commit [`6a9c01f`](https://github.com/apache/spark/commit/6a9c01fa2243059f4441f38574fe0e437a78c7a0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599


   **[Test build #128957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694338060






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694841883


   **[Test build #128864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128864/testReport)** for PR 29790 at commit [`f5f3af5`](https://github.com/apache/spark/commit/f5f3af50fa4ca71e421c8580a9868623924d177e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696550307


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700709523


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33855/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492812835



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
 case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {
     dataTypeCheck

Review comment:
       +1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551094


   **[Test build #128970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128970/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329270






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492553171



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
 
   @transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {

Review comment:
       is it expensive? it just creates a few objects.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492551773



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
     children: Seq[Expression] = Seq.empty,
     exprId: ExprId = NamedExpression.newExprId)
   extends SubqueryExpression(plan, children, exprId) with Unevaluable {
-  override def dataType: DataType = {
+
+  private lazy val internalDataType: DataType = {

Review comment:
       does this need to be a lazy val? seems a very cheap method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494810795



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
   since = "2.4.0")
 case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
-  override def dataType: DataType = {
-    dataTypeCheck

Review comment:
       Do you mean add it back?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694328507


   **[Test build #128834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128834/testReport)** for PR 29790 at commit [`906d2e0`](https://github.com/apache/spark/commit/906d2e064ca01e66ba4490ae5dc0e9c3608dd23f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702274


   **[Test build #128864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128864/testReport)** for PR 29790 at commit [`f5f3af5`](https://github.com/apache/spark/commit/f5f3af50fa4ca71e421c8580a9868623924d177e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696547519


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702274


   **[Test build #128864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128864/testReport)** for PR 29790 at commit [`f5f3af5`](https://github.com/apache/spark/commit/f5f3af50fa4ca71e421c8580a9868623924d177e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694338060






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694341541


   **[Test build #128836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128836/testReport)** for PR 29790 at commit [`6a9c01f`](https://github.com/apache/spark/commit/6a9c01fa2243059f4441f38574fe0e437a78c7a0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696550307


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696783561


   **[Test build #128985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128985/testReport)** for PR 29790 at commit [`f2dc664`](https://github.com/apache/spark/commit/f2dc664df7ccde629d9df8dcad46cf1657a52ff9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329930


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551094


   **[Test build #128970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128970/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696761600


   **[Test build #128983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128983/testReport)** for PR 29790 at commit [`649d3c2`](https://github.com/apache/spark/commit/649d3c2bee57ed851a031ee01ceba0fe75dd6ef7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694842890






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699894015


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33789/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492473081



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -1048,6 +1048,11 @@ trait ComplexTypeMergingExpression extends Expression {
   @transient
   lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
 
+  private lazy val internalDataType: DataType = {

Review comment:
       If only change:
   ```scala
     override def dataType: DataType = {
       dataTypeCheck
       inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
     }
   ```
   to 
   ```scala
     lazy val dataType: DataType = {
       dataTypeCheck
       inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
     }
   ```
   will throw:
   ```java
   [error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:558: super may not be used on lazy value dataType
   [error]       super.dataType.asInstanceOf[MapType]
   [error]             ^
   [error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:2077: super may not be used on lazy value dataType
   [error]       super.dataType
   [error]             ^
   [error] two errors found
   
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
 
   @transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {

Review comment:
       This is to improve this case:
   ![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
   
   
   Benchmark code | Before this PR(Seconds) | After this PR(Seconds)
   -- | -- | --
   spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1,   id)))").collect() | 21787 | 15551
   
   
   

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
 
   @transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {

Review comment:
       This is to improve this case:
   ![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
   
   
   Benchmark code | Before this PR(Milliseconds) | After this PR(Milliseconds)
   -- | -- | --
   spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1,   id)))").collect() | 21787 | 15551
   
   
   

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
 case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {
     dataTypeCheck

Review comment:
       +1

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
     children: Seq[Expression] = Seq.empty,
     exprId: ExprId = NamedExpression.newExprId)
   extends SubqueryExpression(plan, children, exprId) with Unevaluable {
-  override def dataType: DataType = {
+
+  private lazy val internalDataType: DataType = {

Review comment:
       I reverted this change because I did not find an expression to call this method many times.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699863097


   **[Test build #129174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129174/testReport)** for PR 29790 at commit [`6a8877d`](https://github.com/apache/spark/commit/6a8877df876f0afd3c3b9c4248c61beceddcbb11).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492722255



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
 
   @transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {

Review comment:
       This is to improve this case:
   ![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
   
   
   Benchmark code | Before this PR(Seconds) | After this PR(Seconds)
   -- | -- | --
   spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1,   id)))").collect() | 21787 | 15551
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696802081






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700725596






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700885340


   **[Test build #129238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129238/testReport)** for PR 29790 at commit [`37d0786`](https://github.com/apache/spark/commit/37d0786201ec9109b661e19a5152fa53b1d1cad4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492722255



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
 
   @transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {

Review comment:
       This is to improve this case:
   ![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
   
   
   Benchmark code | Before this PR(Milliseconds) | After this PR(Milliseconds)
   -- | -- | --
   spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1,   id)))").collect() | 21787 | 15551
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696547519






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700002852


   **[Test build #129174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129174/testReport)** for PR 29790 at commit [`6a8877d`](https://github.com/apache/spark/commit/6a8877df876f0afd3c3b9c4248c61beceddcbb11).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699894033






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696783561


   **[Test build #128985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128985/testReport)** for PR 29790 at commit [`f2dc664`](https://github.com/apache/spark/commit/f2dc664df7ccde629d9df8dcad46cf1657a52ff9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696961631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329270






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700005928






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696668904






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694526069






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694828526


   do we really need such an invasive change? If there is a specific expression that calls `dataType` many times, let's fix that expression only. Or if this can bring significant end-to-end perf speedup, we can consider accepting it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696961631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551577






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702875






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #29790:
URL: https://github.com/apache/spark/pull/29790


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700683847


   **[Test build #129238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129238/testReport)** for PR 29790 at commit [`37d0786`](https://github.com/apache/spark/commit/37d0786201ec9109b661e19a5152fa53b1d1cad4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700886576






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492814578



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
     children: Seq[Expression] = Seq.empty,
     exprId: ExprId = NamedExpression.newExprId)
   extends SubqueryExpression(plan, children, exprId) with Unevaluable {
-  override def dataType: DataType = {
+
+  private lazy val internalDataType: DataType = {

Review comment:
       I reverted this change because I did not find an expression to call this method many times.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694341541


   **[Test build #128836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128836/testReport)** for PR 29790 at commit [`6a9c01f`](https://github.com/apache/spark/commit/6a9c01fa2243059f4441f38574fe0e437a78c7a0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492473081



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -1048,6 +1048,11 @@ trait ComplexTypeMergingExpression extends Expression {
   @transient
   lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
 
+  private lazy val internalDataType: DataType = {

Review comment:
       If only change:
   ```scala
     override def dataType: DataType = {
       dataTypeCheck
       inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
     }
   ```
   to 
   ```scala
     lazy val dataType: DataType = {
       dataTypeCheck
       inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
     }
   ```
   will throw:
   ```java
   [error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:558: super may not be used on lazy value dataType
   [error]       super.dataType.asInstanceOf[MapType]
   [error]             ^
   [error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:2077: super may not be used on lazy value dataType
   [error]       super.dataType
   [error]             ^
   [error] two errors found
   
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599


   **[Test build #128957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694526069






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700886576






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329930






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702875






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696668904






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492552898



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,16 @@ object ArrayUnion {
   since = "2.4.0")
 case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
-  override def dataType: DataType = {
+
+  private lazy val internalDataType: DataType = {
     dataTypeCheck

Review comment:
       ditto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700725596






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696546740


   **[Test build #128957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700725568


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33855/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492552680



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
 case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
 
-  override def dataType: DataType = {
+  private lazy val internalDataType: DataType = {
     dataTypeCheck

Review comment:
       shall we just remove this line? input data type check is part of the resolution procedure, and we don't need to do it again when accessing the data type.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699863097


   **[Test build #129174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129174/testReport)** for PR 29790 at commit [`6a8877d`](https://github.com/apache/spark/commit/6a8877df876f0afd3c3b9c4248c61beceddcbb11).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696762294






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696780076






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694829038


   Or if an expression has a very complicated `def dataType`, can we change it to `lazy val dataType`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700005941






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694842890






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494807719



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
   since = "2.4.0")
 case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
   with ComplexTypeMergingExpression {
-  override def dataType: DataType = {
-    dataTypeCheck

Review comment:
       After checking the code, seems `dateTypeCheck` is not put in `checkInputDataTypes`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-703615807


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329911


   **[Test build #128834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128834/testReport)** for PR 29790 at commit [`906d2e0`](https://github.com/apache/spark/commit/906d2e064ca01e66ba4490ae5dc0e9c3608dd23f).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696802081


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700683847


   **[Test build #129238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129238/testReport)** for PR 29790 at commit [`37d0786`](https://github.com/apache/spark/commit/37d0786201ec9109b661e19a5152fa53b1d1cad4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699884789


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33789/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329946


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128834/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r490646094



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
##########
@@ -400,7 +403,10 @@ case class Expm1(child: Expression) extends UnaryMathExpression(StrictMath.expm1
   """,
   since = "1.4.0")
 case class Floor(child: Expression) extends UnaryMathExpression(math.floor, "FLOOR") {
-  override def dataType: DataType = child.dataType match {
+
+  private lazy val childDataType = child.dataType

Review comment:
       This looks a pretty common code pattern, so could we move this into base classes, e.g., lazy val in `UnaryExpression`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694328507


   **[Test build #128834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128834/testReport)** for PR 29790 at commit [`906d2e0`](https://github.com/apache/spark/commit/906d2e064ca01e66ba4490ae5dc0e9c3608dd23f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696801765


   **[Test build #128983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128983/testReport)** for PR 29790 at commit [`649d3c2`](https://github.com/apache/spark/commit/649d3c2bee57ed851a031ee01ceba0fe75dd6ef7).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551577






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org